Batted Ball Data
January 11, 2006
I’ve got an article on Thursday at The Hardball Times about batted ball stats. In particular, I’ve posted a format for data tables, and I’d appreciate some comments about the usefulness of the tables, ways to improve them, etc. Please leave your comments here.
Once it’s posted, you’ll find the article at http://www.hardballtimes.com/main/article/how-do-you-like-your-data/.
Very interesting stuff. My question is which of these skills that are under the control of the pitcher or are the chance/defense dependent (it’s hard to tell looking at the numbers of just 5 players)? Obviously line drive rate, etc is reproduceable, but is the net runs per ball something under the control of the pitcher? And I’ll go out on the limb and make one observation, some of Santana’s success the past to seasons is due to groundballs. Am I reading the table correctly?
Hi Joe. Thanks.
Good point about Santana. In the last two years, his line drive rate has been down significantly and his groundball rate is up. That’s a great tradeoff.
I think you can say that the middle columns: batted ball rate, home run and strikeout rates, are totally under the control of the pitcher. However, the columns on the left and right—the ultimate value of each batted ball on average and in total—are also impacted by fielding.
I like the format, we can porvide our own knowledge of what is predictive just as we do with any other stat line. I do wonder why walk% is omitted though. Otherwise great work.
I like the format very much. Are the figures park-adjusted?
Thanks, guys. The run values per batted ball are park-adjusted, but the other figures aren’t (strikeout rate, batted ball frequency). I need to park-adjust the home run rates, but I’m not sure how far to go when applying park factors to all the other individual components.
As Greg Tamer has pointed out, I need to add BFP to the table, so you get a sense of how large each sample size is. I left out W% because there was only so much space, and K% is a driver of other outcomes. I figured the reader should be able to infer W% from the NIP run values compared to K%. Might be too much work, however.
Studes: This is really great data, and well presented. I posted these thoughts at BTF, but will add here as well:
1) I found the last 5 columns for net run values confusing. I’m not sure how they’re calculated or why the first 4 don’t add to the 5th. Pedro gives up 37.6 runs on LDs, for example—compared to what? These also might be easier to interpret if presented on a per-9IP basis, comparable to ERA and related stats (which also somewhat addresses Greg’s BFP point).
2) Personally, I’d find the nets runs per ball figures more intuitive if they showed the absolute value of each BIP type, rather than the value above/below the average plate appearance. So, for example, it would show that Pedro’s average OF is worth .13 runs, vs. an avg value of .16. The average GB would be worth .02 runs (instead of a negative value), while for Pedro it’s 0.0. I know Studes doesn’t agree on this point, but I’d be curious to know if anyone else feels this way.
The format, as almost everything else you do, is awesome. The only thing that would have made the article better? Throw in a player card of Felix, just to make everyone drool. His numbers are absurd.
Posted by David Cameron on 01/12 at 02:18 PM
Great work and presentation!
That the NIP runs and the overall runs match simply means that you’ve done alot of work so that this work never needs to be redone.
The more interesting aspects is when you see a guy go from being a FB pitcher to a GB pitcher. It’s the little stories that are more interesting than the overall “single final number”.
Thanks a lot for the comments. The first four columns of total net runs don’t add to the fifth because because there are other batted ball types (bunts and infield flies) that weren’t included in order to save space. They’re pretty meaningless.
I thought of adding an “average” line to the totals and basing it on the league averages in value per ball and frequency. Perhaps I should go back and do that to give readers a sense of what to compare the totals to. My original thought had been that the comparisons to league averages in run values per ball and frequency is better info, and I didn’t want to cloud the picture.
Regarding your second point, it’s not that I disagree with you. It’s that I didn’t write my original article that way (as you know) and I’ve decided to stick with the same methodology for now to avoid even more confusion.
I do think adding the lg averages for the “net” columns would help. At least, it would help me!
What do you think about presenting these net run values on a /9IP basis? Would seem to make them easier to interpret, especially if you want to profile relievers.
BTW, I wouldn’t say the IF/bunt figures are ‘meaningless’—looks like they save Pedro about 9 runs. But the tables are big already, so I’m not advocating you add that.