Win Shares Replacement Level
December 02, 2004
Using Win Shares to establish a replacement level for baseball players, with some surprising results.
Last year, we spent some time developing a baseline Win Shares level for each player, and we implemented the methodology at the Hardball Times during the year. This baseline is equal to the number of Win Shares an average player would achieve, given that specific player’s playing time. This then led to “Win Shares Above Average,” an important way to interpret Win Share totals.
At the time, I felt this was an important step toward establishing replacement levels for individual players, and I sort of made a promise to myself that I would tackle the issue when I felt ready. See, replacement level is a very complex issue with no right or wrong solution; Bill James admitted as much in his Win Shares book. I believe his next version of Win Shares will include the concept of “Loss Shares,” which is different from Replacement Level, though it also provides important context to Win Share totals.
Now, I’m not claiming that I all of a sudden have great insight into this replacement level thing. But I do now have two years’ data to play with—enough to take a meaningful stab. So I’ve decided to go for it.
My general approach is the "Readily Available Talent" one taken by Keith Woolner in Baseball Prospectus 2002. Keith divided major league players into regulars and backups, and measured the distance between the two to determine replacement level. I won't go into the pluses and minuses of this approach -- Patriot's essay, cited above, does that extremely well.
But this approach is relatively easy to do (crucial word, relatively) with the Win Shares stats we've collected over the last two years. So here's what I specifically did:
- I separated all players into their primary position played (example, Mike Piazza at catcher in 2003, first base in 2004).
- I picked the top sixty players at each position (one position times thirty teams times two years) in playing time, as measured by expected Win Shares. Expected Win Shares is essentially a measure of total plate appearances, innings in the field and/or innings pitched for each player. I called these the "Regulars."
- Win Shares groups all outfielders together, so I took the top 180 outfielders (Three outfielders times thirty teams times two years). I'll talk about pitchers later.
- I then selected the same number of players, still in descending order of playing time, after the first sixty. These are the "Replacements."
- Said differently, I took the 120 players who played the most at each position and separated them into two groups based on playing time: the Regulars and the Replacements.
- Finally, I computed the Win Shares Percentage (WSP) of each group. WSP is a Win Shares rate stat, and it's simply Win Shares divided by two times expected Win Shares. You can think of it as a player's winning percentage, though it has its limitations in that sense.
- I divided the Replacements' WSP by the Regulars' WSP to determine the Replacement Level.
What did I find? Well, here's a table of the Win Shares (WS), Expected Win Shares (ExpWS), Win Shares Percentage (WSP) of each group, and the Replacement Level of each postition:
A little more background: When I started making up Replacement Levels last year, I started at 50%, just on gut feel. Then, I later changed my mind and swithced to 75%. Gut feel, again. Some gut. The answer was in between. Specifically, it looks like the Replacement Levels for outfielders and second basemen is around 70%, shortstops and third basemen around 60%, and catchers and first basemen in between.
This actually might explain a number of things. Why second basemen seem to be so underpaid, for instance (there are more backups available). Or why some shortstops have received nice contracts this offseason (not enough alternatives).
Having said that, I know there are all sorts of problems with this analysis. Injuries, poor distribution of talent and questionable playing time decisions all affect these calculations. Also, two-year samples are not really definitive. And this type of analysis is very sensitive to the number of players you select for each group.
But my gut (!) tells me that the right replacement level for Win Shares is between 60% and 70%, and I'd use 65% for all position players, keeping some of these specific ranges in mind.
What about pitchers, you ask? Good question. I took a slightly different approach with them, and I only used one year's worth of data. And, mimicking Woolner again, I separated pitchers into starters and relievers. The specific steps:
- First, I selected all pitchers who started a game in 2004, and rank ordered them by Games Started. I selected the top 150 (five starters times 30 clubs) and made them the "Regulars."
- For the Replacement group, I included all pitchers who made at least two starts, or one start if that was the only game they pitched. This resulted in 140 Replacements -- slightly less than the Regulars group.
- For relievers, I rank ordered them by (Games Relieved plus Save-Equivalent Innings). I did this because expected Win Shares includes the impact of saves and holds, and pitchers on winning teams have more opportunities for saves and holds. On the other hand, I wanted to make sure pitchers who pitched key innings were ranked highly. Combining two figures like this led to a fairly representative sample across all teams.
- I then selected the top 120 pitchers (four pitchers in the bullpen for 30 teams) and labeled them Regulars and next 120 were the Replacements.
- If a pitcher started more than ten games, or started more games than relieved, I excluded him from the Replacements group.
- For both groups, I then did the WSP analysis to establish a Replacement Level for each.
Here are the surprising results:
The replacement level for relievers is within the same range as position players (62%), but starting pitchers present an entirely different story (38%)! In a sentence, good starting pitchers are hard to find. I should reiterate that there are issues with this approach; all of the previous caveats apply, plus I only used data from one year. My next step will be to pull 2003 data and run the same analysis.
But if this analysis holds water, I will add replacement levels to the Win Shares tables, using something conservative like 45% for starting pitchers and 65% for everyone else. We'll call this Win Shares Above Replacement (WSAR), and I like to think it will be a major Win Share step forward.
One of the major Win Share complaints is that they underrate good starting pitchers, and this analysis seems to bear that out. But I might characterize the situation differently.
Win Shares are an attempt -- and a pretty decent one -- to assign each player's contribution to his team's wins. It doesn't matter what position the player plays, or whether he contributes with his arm, glove or bat. It doesn't matter how hard it is to do what he did, or how rare his particular skill is. What matters, within the parameters of each game and how it was played, is what he contributed to the win.
But if a player can do something that contributes to a win, and very few other people can do it, doesn't that make him more valuable? Said differently, what if Player One contributes 20 Win Shares, and there are a bunch of guys who could only contribute 10 in his place; isn't he worth more than the player who contributes 20 Win Shares, but is backed up by a bunch of guys who could contribute 15 in his place?
Well, yes, to answer a rhetorical question. Yes he is. And this is what WSAR is meant to measure. Win Shares measures how much a player contributed to his team. WSAR measures how rare his talent is among players who play the same role. Contracts are driven by WSAR, because contracts are subject to the laws of supply and demand just like any other market. What's rare is valuable, what's common is less valuable.
Both stats are useful; they measure different things.
I think this is very good work. The revelations about starting pitchers are very interesting.
Is there any way you can determine what the dropoff is from a #1 starter (say the top 30 in WS in any given year) to the #2 starter (31-60) and so on? I wonder if the drop-off from #1 to #2 might be in the 65% range, or maybe that’s the dropoff from #1 to #3, and it’s only 10% from #1 to #2, who knows ...
it would also be fascinating to see numbers for periods when the four-man rotation was more in vogue.
Posted by Black Hawk Waterloo
on 12/02 at 06:49 PM
You have *huge* selective sampling issues that you have not addressed.
Leaving that aside, I will come back with some of my own comments.
Let’s take it as a truism that
- a team of all-replacement pitchers, with a group of average hitters and average fielders would play .420 ball.
- a team of all-replacement nonpitchers, with a group of average pitchers will play .380 ball
The number of wins above replacement for a team of average players would give you:
- average team of nonpitcher: .500 - .380 = +.120
- average team of pitchers: .500 - .420 = +.080
Based on this example, the nonpitchers get 60% of the wins above replacement.
When you have a team of replacement level players, using the Odds Ratio Method, you get a win % of .307, or 49.8 wins, or 31.2 wins below average.
Since we forced our way into the 60/40 rule allocation between nonpitchers and pitchers, our team of average nonpitchers get 18.7 wins above replacement, and our team of average pitchers get 12.5 wins above replacement. Multiplying by 3 win shares per win, and our team of average nonpitchers get 56.1 WS above replacement, and 37.5 WS above replacement.
Win Shares hands out 243 win shares per average team. And, historically (I think), they give out 64% of those to nonpitchers and 36% to pitchers. That’s 87.5 WS per team of pitchers, and 155.5 WS per team of nonpitchers.
To align WS into WS above replacement, we have to satisfy our findings here. So, we have a team of average pitchers getting 87.5 WS, and that same team of pitchers being 37.5 WS above replacement. Therefore, the replacement level for pitchers is 40 WS. 40/87.5= 46%
Doing the same calculation for nonpitchers (and using the 155.5 and 56.1 figures), and we get 64%.
As you can see, if our premise is correct about the .420/.380 split, then WS severely undervalues pitchers. Studes findings here, though I haven’t addressed the mechanics of it, would support my assertion.
What if Bill James were to give out 40% WS to pitchers instead? Well, now he would be giving out 97.2 WS to the average team of pitchers. Being 37.5 WS above replacement means that the replacement level for pitchers is now 61.4%. Repeating the step for nonpitchers (145.8 WS for average nonpitchers, and 56.1 WS above replacement), and we get 61.5%.
If I hadn’t rounded, we’d end up with the exact same numbers. Heck, we can probably tweak the numbers to end up with 61.8%, which is an accidental Fibonnaci number that James would appreciate.
To support the 64/36 breakdown of WS, a team of average nonpitchers with replacement-level pitchers would need to win .428, while a team of average pitchers with replacement-level nonpitchers would need to win .372
As you can see, these numbers are pretty close to my initial premise of .420/.380. A small change in the model has a huge impact in results.
For either Bill James or I to make the claim of .420/.380 or .428/.372, we need to back it up.
is 40 WS. 40/87.5= 46%
Should be 50/87.5 = 57%
Starting pitchers are in the enviable position of “trying out” pitchers, and letting the cream rise to the top. By being allowed to select, after-the-fact, the 150 starters with the most starts is the huge selective sampling issue. If you want to do it right, select the 150 regular starters BEFORE the season begins. If a guy is not good enough to be considered one of the top 150 starters, yet somehow manages to pitch in 20 starts, then something is wrong: (1) your evaluation of pitchers, (2) or he got lucky
But, by selecting them after the fact, you are automatically assuming it is #1.
I think this issue is more prevalent with pitchers than regulars, simply because of the 5 spots you can use for starters, while you don’t have that “tryout” luxury with nonpitchers (not as much anyway).
What about rookies like Webb and Willis in 2003, who are not readily available, but might not have been considered regulars going into 2003? We can try to fix that by allowing an allowance. Say that historically, you get 6 legitimate rookies who become regulars in the year in question, and the next year. That is, they are no flash in the pan, but we could not have predicted them.
For this problem, I’d say to line up all the rookies who did pitch at least 1 game, order them by Baseball America rankings, and grab the top 6 (or whatever you decided), and make those your “regulars”. Any rookie who managed to pitch more than these 6 “rookie studs” but was not part of BA’s top 6 list, then he’d be rightfully determined as a replacement.
Or, perhaps even better, just simply take all rookies out.
Tango, thanks for all your comments. Very good one about the rookies—it should be relatively easy for me to take them out retrospectively. I actually doubt it will make much difference in 2004, but it might in 2003. Maybe at the same time, I can do some of the analyses Waterloo is talking about.
I need to think about your mathematical comments (as always!). Have you ever run WE for an entire year, by player? I wonder how the distribution among different elements plays out if (and it’s a big if), credit for fielding is allocated appropriately?
Posted by studes
on 12/03 at 03:26 PM
Rookies: don’t forget you have to do BOTH things… remove the rookies, AND establish the 150 starters at the beginning of 2004. Just removing the rookies will have almost no impact, and you you’d be wasting your time.
WE: yes, I have that for all players from 99-02.
Are you asking what happens when we allocate fielding, what kind of distribution do we get for nonpitchers and pitchers WE?
Don’t forget, that will still be replete with sampling issues.
My 60/40 levels was also described here:
I have other work that I’ve done, not published, or published here and there, that also leads me to 60/40.
I haven’t really done the necessary rigid work, yet, for my to say 60/40, but I’m more confidant in that, than in James’ 64/36.
There are 2 issues here: 1) do pitchers and non-pitchers have the same repl level, and 2) even if they do, does that mean that they will have the same repl within the WSh system?
Tango is addressing #1, and I think he is correct about the .420/.380 (although I think .420/.375 might be a bit better).
There is no reason to assume that the repl level is the same for pitchers and non-pitchers, because there is no reason to assume that the distribution of talent for pitching vs hitting+fielding is the same, even in MLB ball.
So, you have to take care of both #1 and #2 in the establishment of WSh repl levels.
Studes, I also have a little post on the relevant fanhome thread.
I lay into the fielding portion of Win Shares here:
Posted by tangotiger on 12/09 at 09:10 AM
Commenting is not available in this channel entry.