Check out
http://www.premiersoccerstats.com/mlb/mlbKBBHR.html
which extends your graph for all years and franchises
Latest version of Flash required
Colby Cash and Red Sox Nation have caught graphing fever.
Colby Cash created a graph of pitcher strikeout and walk rates from last year, and he’s been getting a lot of exposure through David Pinto and others. All I can say is “Bravo!” This is the way to get across information and let baseball fans see what’s really going on with their favorite teams and players.
Meanwhile, the guys at Redsox Nation have picked up on the idea, and created their own versions of Colby’s graphs, including career progressions. This is truly awesome.
Having said that, it’s easy to create graphs that don’t really tell the story, or overwhelm the reader with information. So I thought I’d take my own crack at the graph, for what it’s worth.
Here is a graph of all 2003 AL pitchers who qualified for the ERA title. The pitchers who gave up less than one home run a game are in red; those who gave up more are in gray. Also, I’ve put the strikeout rate on the “x” axis and the walk rate on the “y” axis and added average lines to each one. Finally, I inverted the base on balls axis. Tell ya why in a second.
As you can tell, I only labeled the red dots for reference purposes. Labeling all dots became overwhelming.
When developing graphs, it’s important to include context. That’s why the two average lines are important. You could add diagonal isobars, as I do in my team runs graphs, but it’s best if the lines add some sort of context (good/bad/whatever).
By inverting the base on balls line, I created a graph that puts the best pitchers in the upper right hand box. This is the way most people think; good is up and to the right, generally, and it makes sense to format a graph to reflect that bias.
In this graph, the best pitchers are in the upper right and the worst are in the lower left. Control pitchers are in the upper left, and strikeout/poor control pitchers are in the lower right.
Instead of using Colby’s circles to indicate number of home runs allowed, I simplified the concept by identifying two groups; high home run rates and low home run rates. I think the circles work, but only if you have ten to fifteen data points. With several dozen on a chart, it’s hard to process the extra information imparted by the circle sizes.
By the way, this data is not adjusted for ballpark, and it should be.
Colby’s graph is a good example of the power of graphs. With one glance, you can see that Zito excels because he doesn’t give up home runs. That Victor Zambrano really ought to learn to keep the ball in the strike zone. And that Esteban Loaiza really did have a great year.
Graphs easily express relative quantitative information. If you want to understand raw numbers, use tables. But if you want to express relative amounts, or concepts, use graphs. They’re better, as Colby has shown.
Coda:
Colby was nice enough to look at the graph and give me feedback. The one thing he said was that he’d keep the strikeouts on the vertical axis, because top to bottom is more important than left to right.
The research I’ve read about this in the past has suggested that the horizontal axis is more important, but I’m kind of fuzzy on the point. So I redrew the graph, switching the axes. And I’ve got to admit that Colby has a point.
Any reactions?
Check out
http://www.premiersoccerstats.com/mlb/mlbKBBHR.html
which extends your graph for all years and franchises
Latest version of Flash required
Holy cow, Andy! That is truly awesome. Very nice work. I’m going to post a link to it over at The Hardball Times. Thanks!
Thanks for the comment. Let me know when the link is up. The HT is already a regular read(as well as your site,of course!). Good luck with the project
I’ll keep you in touch with enhancements. As you can tell English soccer is my main interest but I will be paying more attention to other sports in the future
Great work, Andy and Studes. Those are awesome three-D graphs, incorporating the three stats that the pitcher has the most control over.
I could look at those for hours on hours.
These graphs are awesome. However, I wonder why the walks allowed per inning doesn’t start at zero on the left. Wouldn’t that be more logical? The bottom left of a graph usually represents the zero point of both axis.
Hi Daser,
The graph is set up this way so that the best pitchers will appear in the upper right portion of the graph. The eye tends to look to the upper right as the “best”, and reversing the walks axis makes that happen.
Daser,
I’ve added a “switch axis: button so you can see the results in your preferred mode
Andy Clark
Thanks for the responses. Personally the alternate method makes more sense to me. I would be inclined to let the numbers speak for themselves, rather than switching the axis to accomodate a certain orientation of goodness. Since the viewer presumably understands that fewer walks = better, it seems they may more intuitively understand that the more left = better on the walks allowed axis. Personally, when I first looked at it the “backwards” x-axis was disorienting.
However, this is just me thinking aloud and is probably merely nit-picking. Keep up the good work, I think this kind of sabermetric visualization is an exciting (and largely unexplored) area of baseball knowledge.
Oy. See, that kind of thinking goes against everything I’m trying to do with this site.
If you are constructing graphs for statisticians, then keeping the “y” axis set to zero makes sense. But I’m trying to create graphs for the common baseball fan, because those are the folks who will gain the most from spatial representations of data.
Studies have shown that most people think up is good, and that things to the right are better (or, more recent). Usually, more means better, but not always. If you take a look around this site, you’ll see that I’ve consistently created graphs that put better, not more, in the upper right hand corner.
This may be confusing to the statistician, but it’s actually intuitive to most common readers. Doing this allows the average fan to glimpse at a graph and see who’s best, regardless of the underlying stats. Then, they can look at the stats in more detail if they’d like.
I don’t know if you’ve looked at the details page, but you can read more of this site’s goals. Also, you might want to reference the work of Edward Tufte.
| <<Previous Article: A Graphical History of Relief Pitching | Next Article: All-Time Franchise Win Shares>> |