The Complete Idiot’s Guide to Projecting Players

A very simplistic overview.

I was on vacation in Massachusetts the last two weeks.  Enjoyed it very much, thanks.  While browsing books at the Harvard Coop bookstore, I saw The Complete Idiot’s Guide to Statistics and decided to buy a copy.  Yes, I browse the mathematics section at bookstores.

I talk about statistics a lot on this blog, but I last took a statistics class over twenty years ago.  I’m pretty sure that I’ve forgotten everything I learned over twenty years ago, so I decided to buy the book to make sure I know what I’m talking about here.  I actually enjoyed reading the book and I’d recommend it for those who’d like to remember what they’ve forgotten from their old stats class.

And I realized that much of the book, particularly the part called Inferential Statistics, is exactly what baseball analysts are doing when they try to project player performances.

There was recently a five-part Projection Roundtable at the Hardball Times that focused on the current state of the art.  I don’t know about you, but much of that discussion was over my head; I haven’t spent a lot of time thinking about projections because I find the current state of baseball so fascinating.

But player projections are the most important task facing ballclubs, so I might start paying a bit more attention to the subject.  Along those lines, let me present the following, very simple, Player Projection Framework.  I’ll call it the Complete Idiot’s Guide to Player Projections.

Let’s say you want to know how many stars there are in the sky.  The problem is that you can’t count them all at once; you can only look at one small portion of the sky at a time, and it would take an eternity to take in the entire sky.  So you can never truly know how many stars there really are in the sky.

It’s the same thing with a baseball player.  A baseball player has what Tangotiger calls a “true talent” level.  When you look at a part of the sky, you’re only counting the stars in a sample of the total sky.  With a ballplayer, when you look at a season of 600 plate appearances, you’re only looking at a sample of his true talent level.  In both cases, the absolute truth can’t be directly measured.

This is a pretty common thing in statistics.  Statisticians are always talking about samples, sample distributions and sampling distribution of the mean.  There’s also this really important concept called the Central Limit Theorem that says that the larger the sample size, the more the sample results will follow a normal probability distribution.  Which means you can consider the results of a player’s seasons to be normally distributed.  See? I did read the book.

Anyway, the basic process, for both baseball and the sky, is to estimate the larger population (true talent level or total stars in the sky) based on the samples you have, and then estimate the likely outcome (and potential range of outcomes) for the next “sample” (or, piece of the sky or season).  And that’s the overview of the Complete Idiot’s Guide to Player Projections.

Here are some specific steps:

  • Estimate a player’s true talent level.
    • Take all the previous stats you have on a player.  The more, the better.
    • Adjust those stats for any bias in the data.  For instance, adjust the stats from the minor leagues, crazy ballparks, playing time against lefties and righties, etc. etc.
    • Regress your results to the mean of a comparable group of players.  You can just use all major league players, or you can choose a subset of players based on things like age, weight, or something else.  The more stats you have, the less you have to regress to a larger population.
    • You can do this for a player, or for each one of a player’s component stats (singles, doubles, home runs, strikeouts, etc.).
    • The result will be a player’s “true talent level.”

     

  • Estimate changes to the player’s true talent level next year, based on age, injury or something else altogether (perhaps even “artificial enhancements”)
  •  

  • Thinking of next year as a sample of the true talent level, calculate the most likely outcome as well as the potential range of outcome (perhaps expressed as one standard deviation).
    • The range of outcome will depend on playing time assumptions.  It would be useful to express different ranges based on different amounts of playing time.
    • At this stage, it would also be nice to add in potential loss of playing time due to injury risk.  You could base this on the player’s history or by a comparison with similar players.

I’m sure one of those fancy-pants sabermetricians will come along and correct me, but I think this is a pretty good framework for how to project player performances.  Some of the keys are how well you correct any bias in the original stats, your regression method, the population to which you regress, whether you do this for components or for overall players and how you estimate ongoing changes to the player’s true talent level.  At this stage, a breakthrough in any of those areas (not to mention the injury risk) would pretty much guarantee you a seat at the next Projection Roundtable.

Posted by .(JavaScript must be enabled to view this email address) on 08/19 at 08:00 AM

I took three semesters of stats in grad school, and I have to admit, most of the time I was thinking things like, “So does this mean Tony Gwynn is significantly better than your average outfielder?”

Posted by Tom G at ballssticksstuff.com  on  08/19  at  07:42 PM
Page 1 of 1 pages

Name:

Email:

Location:

URL:

Smileys

Remember my personal information

Notify me of follow-up comments?

Submit the word you see below:


<< Back to main