An Analytic Framework for Win Shares
October 27, 2003
Over the winter, I hope to play “what if” scenarios with Pete’s 2003 Win Share spreadsheets. In particular, I think it will be fun to tweak the Win Shares system to see if we can improve the output, or at least test the sensitivity of several variables. I plan to explore questions we might have, such as: What if we added loss shares? What if we changed the allocation between pitching and fielding? Etc.?
First, though, we need to develop some standards for Win Shares validity. We need a framework to guide our approach. And I have an idea.
In Win Shares, Bill James introduced the notion of Marginal Runs. In fact, the Win Shares approach is pretty much built on the Marginal Runs concept. Generally speaking, Marginal Runs scored equals all runs over 52% of the league average, and Marginal Runs Allowed equals all runs allowed by the team less than 152% of the league average. Add the two together for total Marginal Runs.
We won’t get into the 52% thing. At least not yet.
Marginal Runs are powerful. To quote from the book: ...the ratio of marginal runs to pythagorean wins is almost exactly the same, regardless of whether you are looking at a good team or a bad team!
This was certainly true in 2003. Here is a graph of marginal runs to pythagorean wins in the American League:
As you can see, Marginal Runs are a very good predictor of wins (at least, pythagorean wins). The correlation coefficient (R squared) for this dataset is .996, which is about as good as you can get. So Marginal Runs is indeed a great concept.
By the way, the chart and analysis for the National League show an equally strong relationship.
I should make a few points about the math. First, the dataset is polluted somewhat by interleague games (which means that the runs scored and allowed in the American League aren’t equal). When I’ve looked at correlation coefficients for self-contained leagues in the past, I’ve generally calculated R squared’s of .998.
Second, we can simplify the Marginal Runs equation to Runs Scored minus Runs Allowed plus league-averageRuns Scored. I’ll call this RS-RA+AvgRS. We can do this because the 152% and the 52% subtract out to one. So the Marginal Runs calculation simplifies into run differential for each team plus a league-wide constant. James pointed this out in the book.
Further, you can drop the AvgRS altogether and just use RS-RA and achieve the exact same correlation. That’s because AvgRS acts as a constant in a linear equation and doesn’t affect correlation. So the Marginal Runs equation basically boils down to this: Wins are a nearly exact function of Run Differential.
This doesn’t undermine the validity of the Marginal Runs approach, in my opinion. I just wanted to make the math clear.
Anyway, I propose to use the Marginal Runs framework as a guideline to the validity of various aspects of Win Shares. If a calculation fits or enhances the relationship between marginal runs and wins, we should keep it. If it weakens the relationship, we should fix it. If the impact is unclear, well, we’ll see.
For example, here’s a graph of Marginal Runs vs. actual Wins in the American League.
.
Not as good a fit, because teams sometimes inexplicably veer from their pythagorean projections. The correlation coefficient for this data is .946.
So here’s my first question: Should we correct the Win Shares methodology for this, since it doesn’t pass our validity test? (that is, it weakens the relationship between Marginal Runs and “Wins"). Well, not in this case, in my opinion.
Bill James clearly meant for Win Shares to reflect the actual performance of the team; its actual wins total. Players deserve credit, and blame, when their team overperforms or underperforms their won/loss projection. This is an aspect of Win Shares that some observers sometimes forget.
Alberto Pujols earned about 10% less Win Shares than he would have had he played with the San Francisco Giants, because the Cardinals were under their Pythagorean Projection by four games, while the Giants beat theirs by six and-a-half games. Is that fair? Well, maybe not. But Win Shares is clearly meant to be more of a value statistic than an ability statistic. And James’ stand is that value is best measured by win totals, that players should be judged by the number of games their teams win. It’s hard to argue with that.
| Next Article: Applying Park Factors>> |

