Baseball Graph Details


Questions

  1. Why baseball graphs?
  2. So what's the theory behind your graphs?
  3. Exactly how do you build your graphs?
  4. Then why Win Shares?
  5. How are Win Shares calculated?
  6. Is it legitimate to calculate Win Shares inseason?
  7. Where can I find other baseball graphs?
  8. Who created your cool artwork?

Search "The Language of Baseball"

Search baseball:
Powered by 123explore!

Answers

Why baseball graphs?

Baseball fans have learned a lot over the past twenty years. Bill James, and many others, have helped many of us better understand the essential dynamics of this game we love. Their insights into how runs are scored, how teams win and lose, etc. etc. have enhanced our enjoyment of the game tremendously.

Unfortunately, the notion of "sabermetrics" is still held in contempt by many baseball fans. Even though many sabermetric insights are simple common sense, most fans (and even sportswriters) choose to ignore them.

I believe one of the problems is the way baseball information is presented. Baseball analysts like to research things in minute detail, presenting their results in nuanced numeric tables. The problem is that most of these tables are unintelligible to the average fan. And very few people have tried to bridge the gap between the things we've learned about baseball and the way we present baseball statistics.

So that's what I'm trying to do with this site. Over the years, I've learned that information is best processed through pictures, or in the case of numbers, graphs. In the business world, accountants and actuaries use numeric tables, but real business decisions makers use graphs and conceptual charts. So I've tried to convey basic baseball information via a few graphs, focusing on team statistics.

The expert in this field is Edward Tufte, and I'd recommend his books to anyone who would like to pursue the same path.

So what's the theory behind your graphs?

Glad you asked. These graphs are organized around three fundamental insights that have emerged from baseball analysis.

First, the number of games that a team wins is generally attributable to the difference between that team's runs scored and runs allowed. Also known as the Run Differential. In other words, teams win games by outscoring the other team. Over a full season, the teams with the most wins are those that have achieved the greatest total differential between runs scored and runs allowed. Sounds simple, right?

Unfortunately, a lot of fans often overlook this simple fact. Certain myths, such as "pitching is 95% of the game" stubbornly persist. I believe that one of the reasons they persist is that fans can't easily see the difference between runs scored and runs allowed for their favorite team. So my graphs are designed around the simple concept of Run Differential.

Second, the ability to score runs comes down to two things: getting on base and moving around the bases. These are represented by two well-known statistics: On Base Percentage (OBP) and Slugging Percentage (SLG). This is the reason you hear some baseball analysts cite OPS (OBP plus SLG) as a batting metric.

This insight may be a bit less apparent to you than Run Differential. Many fans focus on the "triple crown" of batting: Batting Average, home runs, and RBIs. These three stats have been ingrained into most baseball fans' minds as the most important batting stats of all.

However, batting average is not nearly as powerful a statistic as it appears. Many times in baseball history, the team with the most runs scored has not been the team with the best batting average. This suggested that teams that get the most hits with their fastpitch bats doesn't always win the game.

Also, RBI's are situational in nature. Good hitters tend to have lots of RBI's, yes, but only if they come to bat with lots of runners on base. RBI's tend to be a function of the batters in front of a hitter as much as the hitter himself.

As it turns out, OBP and SLG are two elegant offensive statistics. If you take the statistical totals of any league in baseball history, multiply its OBP by its total bases (the key component of SLG), you will get a number that is almost always within 1% of total league runs scored! When you apply this math to individual teams, you usually get a number within 5% of team runs scored.

This is an astouding mathematical concept. The person who discovered this basic truth must have felt like Archimedes, running down the hall naked and shouting "Eureka." The essence of offense comes down to two simple averages. So I have drawn graphs and accompanying tables that highlight them.

Third, allowing runs to score is a function of fielding and pitching, right? Well, one of the newer insights of sabermetrics is that pitchers may not have a lot of impact on balls hit in the park. That is, once a ball is hit fairly by a batter (and stays in the park) the likelihood of a single vs. an out may not depend a great deal on who threw the pitch.

There is a lot of research occurring in this field, and firm conclusions are elusive. Voros McCracken's article is the one that started the brouhaha. Another good article is Tom Tippett's research.

Still, I've built graphs that are built around two metrics that divide responsibility for runs allowed between pitching and fielding. You can read more about the precise metrics below.

Bottom line, these graphs are designed to present a structured way for readers to see and understand each team's run differential and its contributing causes.

Exactly how do you construct your graphs?

Now we're really getting into it.

Runs Differential graphs are pretty simple, displaying Runs Scored and Allowed by team. Scoring runs is equally as important as stopping runs from scoring, so the scale of the two axes is the same. The only wrinkle is that I have adjusted them by the average of the last three years' Park Factors at Baseball Reference.com, to iron out differences between ballparks.

I've also added a number after each team name on the graph that is the difference between actual wins and projected wins, based on the Pythagorean Theorem, which is a formula that very accurately predicts wins based on Run Differential. Variances against the Pythagorean Theorem are often a function of "luck" and tend not to persist over time.

OBP and SLG are self-explanatory, I hope.

Although attributing runs allowed to pitching and fielding is certainly not a straightforward task, I've chosen to use two metrics that seem best suited for the task. The results of an at bat can be separated into two pots: those in which only the pitcher and batter play play a role (strikeouts, walks and home runs, broadly speaking) and those in which fielding also plays a role (Balls in Play, or BIP). To calculate the first pot of events, I used Tangotiger's Fielding Indendent PItching, which calculates the relative run impact of each event.

The calculation for FIP is simple: (13*HR+3*BB-2*K)/IP. The number you get from this calculation is the proportion of ERA that can be directly attributed to a pitcher. If you add 3.20 to FIP, this number answers the question: if the pitcher did not have the benefit of his fielders, how would he perform compared to an average team defense, including fielders?

For the second pot of events, I'm experimenting with several different metrics. On some graphs, I use Defense Efficiency Ratio (DER) as a proxy. DER, created by Bill James, is essentially a measure of the number of Balls in Play that are subsequently turned into outs by the fielders. DER is a function of fielders, pitchers, park and probably a few other things, but it's a decent indicator of fielding prowess. You can view the complete, up-to-date DER calculations by team at The Hardball Times.

Then why Win Shares?

Win Shares are the creation of Bill James, as articulated in his book of the same name. The basic idea of Win Shares is to credit individual players with the number of wins they contributed to the team, based on virtually everything they did while on the field: batting, pitching and fielding, even a little baserunning. Win Shares are the perfect complement to Baseball Graphs, because they calculate each of the sabermetric "truths" described above and attribute them to individual players on each team in one, simple-to-understand, number.

Each team's total wins is multiplied by three, and then distributed to individual players, based on their batting, fielding and pitching. There's no magic to the 3x multiplier, by the way. It's just done to create enough meaningful variance between players.

It took Bill James about 100 pages to describe the entire methodology. And while there are certainly some flaws that will be corrected in the future with better data (such as play-by-play data), or new methodologies, it's a pretty intriguing system. The most thorough critique I have found of the Win Shares methodology was that conducted by Tangotiger and Rob Wood. Warning: this link is a 40-page, very theoretical PDF document.

How are Win Shares calculated?

Okay, here's how Bill James calculates Win Shares. Ready?

Now, what was your question?

Well, you may have a question about how we compute Win Shares today. In fact, we have made a few changes to the basic James formula, which are outlined in this article at the Hardball Times.

Interleague play makes it unlikely that a league's runs scored will equal its runs allowed (ditto hits, walks, etc.). In cases where Win Shares is unclear about using a league's runs scored or allowed, we have generally used the allowed figure.

Win Shares explicitly states that the fielding points on certain scales for certain positions are bounded. For example, catchers' points on the 50-point scale are bounded between 0 and 50. However, as far as I can tell, other points are not bounded (example: catchers' points on the 30-point scale). This isn't really much of a problem, except in the early parts of the season.

Is it legitimate to calculate Win Shares inseason?

There are a number of minor calculations in Win Shares that are based on an entire season. We've adjusted those calculations to handle inseason stats.

Some people seem to object to Win Shares being calculated in midseason. As a response, let me offer you this quote:

"Despite the 'Book of Values' given with this volume, you may wish to figure Win Shares for some other team, such as next year's San Diego Padres, as of the All-Star break, or your son's little-league team."
That's Bill James himself, on page 14 of Win Shares, giving implicit permission for inseason calculations.

Where can I find other baseball graphs?

There are several other resources for baseball graphs on the web. At the Hardball Times, I've got graphs are updated daily for the current season. This site has a wonderful set of historical baseball graphs, including trend rates over time.

Another I've found was created by a math professor in Indiana, and it includes some neat historical graphs (such as total wins over the century by the original eight teams in each league).

If you're a Rangers' fan, you also might enjoy these graphs.

Who created your cool artwork?

The banner and general look of this site were created by the talented Kasia. So far, Kasia's only exposure to baseball has been one viewing of Field of Dreams, and I haven't had the heart to tell her that catchers aren't lefthanded.

Some of the graphics are courtesy of Baseball History Info. I have also added colorized pictures of great ballplayers, courtesy of Portrait Matt.


That's it for now. I am constantly looking for ways to improve the usability of this site, so please send any comments or suggestions my way via email.