The Baseball Graphs Blog
Thursday, August 31, 2006
Greg Maddux’s Great Game
I wrote about Greg Maddux’s 330th win yesterday in today’s THT Daily. My brother, who has been following the Dodgers closely for approximately 80 gajillion years, sent me the following e-mail with a great description of Maddux’s performance:
Maddux’ performance yesterday was far better than anyone has expressed. In addition to what you mentioned, consider:
1. The Dodger bullpen was in shock after the long extra inning game the night before, so Maddux averaged something like 10 pitches an inning and rescued the entire pitching staff.
2. Maddux singled home the first run of the game with two out and took second on the throw on what should have been a close play.
3. Maddux started two double plays, one on a spectacular stab.
4. Maddux squeezed home a run!
5. In the bottom of the 7th, Maddux made an amazing play that even Vin Scully didn’t really fully appreciate. With two out and men on (I forget exactly), the batter lined sharply to Nomar, who seemed likely to glove it for the final out. I relaxed, Scott relaxed, and I’m sure that Vinny relaxed. But not Maddux! When the ball skipped off Garciaparra’s glove, Normar picked it up but was too far from first to beat the runner. BUT MADDUX WAS THERE! Maddux had run hard to cover first even though it seemed obvious to everyone that Nomar would catch it.
6. When Maddux walked off the field after this play, over 30,000 Angelos rose and gave him a standing ovation. If you’re from New York or Boston or Chicago or St. Louis, you won’t understand how important this last point is, but having lived in LA for 36 years, I can tell you that it was huge. We don’t do things like that. Maddux has revitalized the fans to the point where cheers break out without the organ helping us (sort of like they do in the East). From that point of view, he’s replaced Eric Gagne as the adopted hero of the fans, and I think you’ll see that this new level of support will help the team substantially.
Monday, August 21, 2006
WPA in Wins and Losses
One of the theories of win-based stats going around is that only contributions in wins should count toward a player’s wins, and vice versa for losses. Andy takes a look at WPA in Yankee wins and losses in his most recent Yankee WPA Rundown blog and finds some interesting things.
For instance, Mariano Rivera would rack it up in that sort of system, because he mostly enters games when the Yankees are ahead. The other interesting finding is that Derek Jeter stands out as someone who contributes a lot in both Yankee wins and losses. Fun stuff.
Saturday, August 19, 2006
The Complete Idiot’s Guide to Projecting Players
I was on vacation in Massachusetts the last two weeks. Enjoyed it very much, thanks. While browsing books at the Harvard Coop bookstore, I saw The Complete Idiot’s Guide to Statistics and decided to buy a copy. Yes, I browse the mathematics section at bookstores.
I talk about statistics a lot on this blog, but I last took a statistics class over twenty years ago. I’m pretty sure that I’ve forgotten everything I learned over twenty years ago, so I decided to buy the book to make sure I know what I’m talking about here. I actually enjoyed reading the book and I’d recommend it for those who’d like to remember what they’ve forgotten from their old stats class.
And I realized that much of the book, particularly the part called Inferential Statistics, is exactly what baseball analysts are doing when they try to project player performances.
There was recently a five-part Projection Roundtable at the Hardball Times that focused on the current state of the art. I don’t know about you, but much of that discussion was over my head; I haven’t spent a lot of time thinking about projections because I find the current state of baseball so fascinating.
But player projections are the most important task facing ballclubs, so I might start paying a bit more attention to the subject. Along those lines, let me present the following, very simple, Player Projection Framework. I’ll call it the Complete Idiot’s Guide to Player Projections.
Let’s say you want to know how many stars there are in the sky. The problem is that you can’t count them all at once; you can only look at one small portion of the sky at a time, and it would take an eternity to take in the entire sky. So you can never truly know how many stars there really are in the sky.
It’s the same thing with a baseball player. A baseball player has what Tangotiger calls a “true talent” level. When you look at a part of the sky, you’re only counting the stars in a sample of the total sky. With a ballplayer, when you look at a season of 600 plate appearances, you’re only looking at a sample of his true talent level. In both cases, the absolute truth can’t be directly measured.
This is a pretty common thing in statistics. Statisticians are always talking about samples, sample distributions and sampling distribution of the mean. There’s also this really important concept called the Central Limit Theorem that says that the larger the sample size, the more the sample results will follow a normal probability distribution. Which means you can consider the results of a player’s seasons to be normally distributed. See? I did read the book.
Anyway, the basic process, for both baseball and the sky, is to estimate the larger population (true talent level or total stars in the sky) based on the samples you have, and then estimate the likely outcome (and potential range of outcomes) for the next “sample” (or, piece of the sky or season). And that’s the overview of the Complete Idiot’s Guide to Player Projections.
Here are some specific steps:
- Estimate a player’s true talent level.
- Take all the previous stats you have on a player. The more, the better.
- Adjust those stats for any bias in the data. For instance, adjust the stats from the minor leagues, crazy ballparks, playing time against lefties and righties, etc. etc.
- Regress your results to the mean of a comparable group of players. You can just use all major league players, or you can choose a subset of players based on things like age, weight, or something else. The more stats you have, the less you have to regress to a larger population.
- You can do this for a player, or for each one of a player’s component stats (singles, doubles, home runs, strikeouts, etc.).
- The result will be a player’s “true talent level.”
- Estimate changes to the player’s true talent level next year, based on age, injury or something else altogether (perhaps even “artificial enhancements”)
- Thinking of next year as a sample of the true talent level, calculate the most likely outcome as well as the potential range of outcome (perhaps expressed as one standard deviation).
- The range of outcome will depend on playing time assumptions. It would be useful to express different ranges based on different amounts of playing time.
- At this stage, it would also be nice to add in potential loss of playing time due to injury risk. You could base this on the player’s history or by a comparison with similar players.
I’m sure one of those fancy-pants sabermetricians will come along and correct me, but I think this is a pretty good framework for how to project player performances. Some of the keys are how well you correct any bias in the original stats, your regression method, the population to which you regress, whether you do this for components or for overall players and how you estimate ongoing changes to the player’s true talent level. At this stage, a breakthrough in any of those areas (not to mention the injury risk) would pretty much guarantee you a seat at the next Projection Roundtable.
Friday, August 04, 2006
More Organizational Trees
In my latest Ten Things article on The Hardball Times, I noted that Will Young has built a pretty cool organizational tree that shows how each member of the Minnesota Twins’ roster had been acquired. And I wondered if there were any more trees like that.
Maybe there weren’t at the time, but a couple of guys have built them for their favorite teams. Ben Kabak posted one for the Yankees on his blog and a reader named Greg Sullivan created one for the Red Sox. Greg doesn’t have his own blog (there are still people without their own blog?) but he sent it to me and I thought I’d make it available for you here.
An organizational tree is an insightful way to look at a roster. But wouldn’t you know that the first two (after Will’s) would be for the Red Sox and Yankees?
Top Months So Far
Sox Watch went back to the Fangraphs WPA totals and calculated who has had the biggest WPA months so far. David Ortiz’s July was second best to Albert Pujols’s April. Pujols’s July was third-best, which surprised me a little bit given his time on the DL. Chase Utley’s July is fourth.
Here’s the top ten list:
1 Albert Pujols April 3.115 2 David Ortiz July 2.351 3 Albert Pujols July 2.144 4 Chase Utley July 1.929 5 Ryan Zimmerman July 1.869 6 Jason Schmidt May 1.864 7 Jermaine Dye July 1.808 8 David Ortiz June 1.802 9 Ryan Howard May 1.797 10 Jason Bay May 1.794
Wednesday, August 02, 2006
The first players from Each Country
In a recent SABR-L discussion, home run king David Vincent listed the first persons to play major league ball from a specific country of birth. The list was inspired by the Indians’ Tom Mastny, who was born in Indonesia—the first major leaguer born in that country.
No, I never heard of Dodecanese Island.
USA 05/04/1871 Many players England 05/05/1871 George Hall, Harry Wright Ireland 05/05/1871 Andy Leonard Cuba 05/09/1871 Steve Bellan Netherlands 05/18/1871 Rynie Wolters Germany 05/20/1871 George Heubel France 04/26/1875 Larry Ressler Canada 09/15/1875 Tom Smith Scotland 05/20/1878 Jim McCormick Australia 04/26/1884 Joe Quinn Austria-Hungary 04/22/1885 Amos Cross Sweden 09/23/1885 Charlie Hallstrom Norway 09/08/1894 John Anderson Wales 07/06/1896 Ted Lewis Russia 08/20/1897 Jake Gettman Colombia 4/23/1902 Louis Castro Switzerland 8/3/1902 Otto Hess Denmark 8/11/1911 Olaf Henriksen Spain 5/16/1913 Al Cabrera Atlantic Ocean 4/17/1914 Ed Porray China 7/1/1914 Harry Kingman Finland 8/28/1921 John Michaelson Poland 9/19/1929 Henry Peploski Italy 4/18/1932 Lou Polli Mexico 9/8/1933 Mel Almada Venezuela 4/23/1939 Alex Carrasquel Czechoslovakia 9/22/1940 Elmer Valo Puerto Rico 4/15/1942 Hi Bithorn Dodecanese Island 9/23/1943 Al Campanis Austria 4/21/1949 Kurt Krieger Panama 4/20/1955 Humberto Robinson Canal Zone 4/19/1956 Pat Scantlebury Dominican Republi 9/23/1956 Ozzie Virgil Bahamas 4/16/1957 Andre Rodgers Virgin Islands 5/26/1959 Joe Christopher Japan 9/1/1964 Masanori Murakami American Samoa 9/16/1968 Tony Solaita West Germany 8/2/1975 Rob Belloir Nicaragua 9/14/1976 Dennis Martinez Jamaica 4/10/1981 Chili Davis Honduras 7/8/1987 Gerald Young Curacao 8/23/1989 Hensley Meulens British Honduras 7/5/1991 Chito Martinez Afghanistan 5/2/1993 Jeff Bronkey South Korea 4/8/1994 Chan Ho Park Singapore 4/18/1996 Robin Jennings Philippines 5/26/1996 Bobby Chouinard South Vietnam 7/13/1996 Danny Graves Belgium 8/25/1996 Brian Lesher Aruba 9/3/1996 Gene Kingsale Taiwan 9/14/2002 Chin-Feng Chen Indonesia 7/30/2006 Tom Mastny