Batter Types

February 21, 2006

Using Batted Ball Info to typecast batters.

One thing I’ve wanted to do is use the batted ball library to investigate ways to categorize batters.  When we think of pitchers, we automatically think of strikeout pitchers, or groundball pitchers.  But when we think of batters, we tend to think of home run hitters, or hitters for average.  Nothing wrong with that, but I’ve wondered if there’s a more insightful way to categorize the men at the plate.

So here’s what I did. 

First, I stared at the batted ball stats for many hours.  Then I started playing with the stats themselves.  That’s the best description I can give you.  In the end, I decided to focus on three key metrics as the keys to batter success:

- Control the plate, which includes not striking out and drawing the base on balls.
- Avoid groundballs or, as a corollary, beat out groundballs when you do hit them.
- Get the most out of your “air balls,” by hitting line drives, avoiding infield flies and (most importantly) hitting outfield flies for extra base hits (particularly home runs).

I next developed an index system that captures each of these metrics.  To do that, I created a dataset of all batters who appeared at the plate at least 500 times in the past four years.  I then calculated the average and standard deviation of park-adjusted figures for:

- Net runs per ball not in play (i.e. strikeouts, walks and HBP), as well as K/PA and BB/PA.
- Groundballs per batted ball, as well as net runs per groundball
- Net runs per air ball (outfield flies, line drives and infield flies) as well as line drive per airball, infield fly per fly ball and net runs per outfield fly.

Next, I computed each batter’s “Z Score” for each of these stats.  A Z Score is a measure of how much each batter deviated from average, in terms of the standard deviation.  So a Z Score of 0 equals average, 1 equals the 66th percentile, 2 equals the 95th percentile and 3 equals the 99th percentile.  It’s a way of standardizing stats into a common scale.  I regressed the Z Scores of four key metrics (NIP, GB rate, net runs per GB and net runs per air ball) against total net runs scored for each batter to appropriately weight the Z Scores into a batting metric (let’s just call it our Index) that is equivalent to your basic runs created measure.  The “Total Index” is, in effect, a weighted average of each of the four Z Scores, based on how important each one is.

In case you’re worried if this approach is valid, here’s a graph of how well the Index matches each batter’s Total Net Runs per plate appearance:

image

As you can see, the Index matches overall performance very well (R squared of .97).  That’s not surprising, because it was designed to.  But I just wanted to put your mind at ease.  You get one guess regarding the identity of the lone red triangle all by itself in the upper right.

Here’s an example of the final product: Sean Casey (0.2: 1.6, -0.5/-0.5, 0.0)

In this example, Sean Casey has an overall Index of 0.2, which is just slightly above average.  His strength is controlling the plate, where he has an index of 1.6.  Unfortunately, he hits a few too many balls on the ground and doesn’t have the speed to make them productive (-0.5/-0.5).  Lastly, and this is really key, his air balls are about average overall.

Let’s pull another example: Hee Seop Choi: (0.2: -0.6, 0.6/-1.0, 1.6)

Hee Seop Choi is another first baseman with a Total Index of 0.2.  But his profile is a little different.  He doesn’t control the plate particularly well (due to his high strikeout rate).  However, he does a better job of avoiding groundballs and does a much better job of getting some production out of his air balls.

Hopefully, you can see how this system might be useful.  In fact, imagine a world in which these stats were referred to as a matter of course, just like BA/HR/RBI are now.  Wouldn’t you know a bit more about each batter?

Here’s a spreadsheet that contains the Indices for all batters who qualified for the database.  Download and play with the data all you’d like.  Let me know if you have any questions or comments.

I have a few ideas of how to use this data to highlight different categories of batters, and I’ll be posting some of those in the next couple of weeks.



Dave,

Minor (unimportant) mistake: A player 2 SD above the mean is actually in the 97th percentile (97.5% of players are worse), 1 SD above the mean is in the 84th percentile, etc. The 68/95/99.7 rule refers to both tails of the distribution.

Posted by David Gassko  on  02/22  at  06:28 PM

Thanks, David.  I love the Internet.

Posted by  on  02/22  at  07:09 PM

This is terrific.  Can’t wait for the pitcher spreadsheet....  :>)

Posted by Guy  on  02/24  at  01:31 PM
Page 1 of 1 pages

Name:

Submit the word you see below:


<<Previous Article:  Welcome, Brian Cashman Next Article:  The Fielding Bible>>