My Vacation with FIP and DER
December 29, 2003
If you’ve spent any time around this site (and God bless you if you have), you know that I like to talk about FIP and DER a lot. FIP and DER sound like some mythical creatures, maybe Pan and Echo, or Merry and Pippin, but they’re not. They’re a couple of pitching statistics that are pretty throughly explained in this article. So I won’t go into that here.
But I will say this: If I can only have two statistics about a batter, I’ll choose OBP and SLG; for a pitcher, I’ll choose FIP and DER. FIP and DER paint a fairly complete picture of a pitcher—how much he relies on his defense, how great he basically is on his own, how lucky or unlucky he’s been, how replicable his success is, etc. FIP and DER are also very important for our understanding of Win Shares, which splits responsibility for runs allowed between pitchers and fielders. So I took a chunk of my Christmas vacation to better understand these two numbers.
Here’s what I did. I looked at all pitchers from 1980 through 2002 with at least 130 innings pitched, which gave me 2,228 pitcher seasons. Seemed like enough. (Copyrights belong to, and thanks are given to, Sean Lahman). I compared FIP, DER and ERA for all these guys, and here’s what I found out.
First of all, FIP and DER account for about 82% of ERA variance by pitcher. In mathematical terms, the R squared of FIP and DER vs. ERA was .817. That’s a high correlation; confirmation that FIP and DER tell you most of what you need to know about a pitcher (albeit on a basic level).
By the way, the DER stats in this analysis were somewhat limited, because it’s hard to get historic DPs, outfield assists and caught stealing by pitcher. I also should look at runs scored instead of earned runs. Maybe next time.
Anyway, when you look at these stats in isolation, FIP has a higher correlation with ERA than DER has. FIP achieved an R squared of .54 against ERA, while DER came in at .31. More about this later.
When predicting a pitcher’s ERA in the following season, current year FIP is actually a better predictor than current year ERA (R squared of .18 vs. .14). In that way, FIP works just like DIPs ERA, which is what Tangotiger created it to do.
There is no correlation between FIP and DER. None, nada, zip. The R squared between the two is .002. This is good, actually. It means that FIP and DER measure two completely different things.
As a next step, I computed an intermediate stat called DERA (a combination of DER and ERA—get it?). It equals ERA minus FIP. So it represents the proportion of a pitcher’s ERA for which he shares responsibility with his fielders.
Across all pitchers, the average ERA was 3.98. The average FIP was 1.04 and the average DERA was 2.94. However, the distribution of each stat was different. Time for some graphs.
This is called a “box-whisker graph,” and it’s a superb tool for exhibiting statistical distribution. There are three blue boxes here, one for each stat. The blue line in the middle of each box is the median of that stat (which, in this case, is just about equal to the average of each stat). The top and bottom of the blue boxes represent the first quartile above and below the median (in other words, the 75th and 25th percentiles). So 50% of each stat falls within its respective blue box.
There are lines sticking out of the top and bottom of each blue box. These extend, without any cross marks, for 1.5 times the difference between the 25th quartile and the 75th quartile. This is the typical standard used to measure outliers. Finally, the cross marks that you see outside each line extension are the outliers for each stat. Where the cross mark is bold, there are several instances of an outlier.
I hope this makes sense to you. I may be stretching the graphing frontier, but there are some basic lessons to be taken from this graph:
- The variances (size of the blue box) of FIP and DERA are relatively even, though FIP’s is slightly larger. ERA variance is larger, which is what happens when you put two unrelated stats together to form another.
- There are not many bottom outliers for ERA and DERA, because neither of those stats fall below zero. There’s a natural floor to how low ERA and DERA can go. This is not true for FIP, which does fall below zero.
It’s FIP that makes pitchers truly great. Generally speaking, pitchers in this sample did not get their ERA below 2.15 unless their FIP was negative. Once negative, ERA went as low as 1.53 (Dwignt Gooden’s 1985).
That extreme bottom outlier with a FIP of -1.87 is Pedro Martinez’s extraordinary 1999 season. In 213 innings that year, Pedro struck out 313, walked 37 and gave up only nine home runs. That was an incredible year, the best FIP of any pitcher in the last two decades, by far. His overall ERA that year was 2.07, which means that his DERA was 3.94. His fielder’s DER was pretty poor, at .684.
The next Pedro year, 2000, was the second-best FIP year of the past two decades. Pedro’s FIP was -1.16. However, his fielders did much better in 2000, with a DER of .768, so his ERA was 1.74, making his DERA equal to 2.90.
I believe these are the two years that made Voros McCracken infamous.
I next wondered if FIP and DERA are related at all. My regression analysis indicated that no, DERA does not go up or down with FIP. Here’s a graph of the data points:
Pretty much a mess, right? No pattern there at all. And then it struck me:
Ignoring Pedro’s bizarre 1999 season, the variance in a pitcher’s DERA rises as FIP increases from the negative to the positive, and seems to max out at 1.00 (which is about the overall FIP average). At that point, the variance in DERA may actually decrease as FIP rises further.
Okay, this is a long, windy article, and I apologize for that. But I am coming to a point here, which is:
As a pitcher’s FIP decreases, defense becomes less relevant to the overall outcome. Not negative, not positive. Just less relevant.
This is not a new insight. In fact, it’s downright obvious. When pitchers strike out more batters, allow less walks and home runs, the impact of a hit that falls in for a single is diminished. There are less baserunners to drive in; less home runs driving in the baserunners.
Here’s the math: the standard deviation for DERA is 0.5 for pitchers whose FIP is negative, and rises to .83 for pitchers with FIP over 2.5. The standard deviation for DER does not change as FIP changes, just DERA. We’re talking relevance and variance here.
By the way, DER is roughly the same for each group (actually, slightly higher for pitchers with good FIPs) and DERA is actually higher for pitchers with negative FIP (average DERA decreases from 3.06 to 2.86 as FIP increases).
Enough already. I’ve tried to outline some of the key dynamics between pitchers and fielders, highlighting some of the key differences in variance between the pure pitching and fielding-influenced stats. There’s more analysis to come. But I’ll stop here for now.
Very nice article. The idea that you get diminishing returns on defense once you assemble a team of good pitchers and good fielders is interesting. In contrast, the impact of a good hitter is greater if surrounded by other good hitters.
So, let’s say you have a good, well-balanced .600 team, that scores .5 R/G above average and surrenders .5R/G less than average. Where do you put resources to improve? Salaries are presumably based on players’ impact on an average team. So, this team should get more bang for their buck from upgrading their hitting—where the impact would be above average—than from upgrading their pitching or fielding.
But against that, you have Tango’s point that the win multiplier is higher for marginal reductions in RA than for marginal increases in RS. So maybe it’s a wash, and either strategy would deliver an equal W/$ return. Would be interesting to try to figure out, but it’s beyond my talents.
Posted by Guy on 12/31 at 10:58 AM
Thanks, Guy. One little caveat: I think I’m saying that you get diminishing returns on fielders as pitching improves. I wasn’t including good fielding.
You know, I don’t think this article hits the nail on the head. The conclusion bugs me; I don’t think I have it quite right. I’ll put a few comments up over at Primate Studies, where this has been posted. If others have any ideas, please let me know.
Posted by studes
on 12/31 at 12:34 PM
I think both must be true: good fielders diminish the value of good pitching (allowing walks and HRs hurt you less), and vice-versa (allowing 1B/2B/3B hurts less if pitchers give up fewer BB/HR).
In terms of your broader concern, I wonder if it would help to use two measures on the same scale:
1) FIPERA (FIP + 3.20)
2) Def ERA (ERA minus FIPERA plus league-average-ERA)
If I did that right, you end up with the pitchers’ ERA if he had lg-avg defense, and the ERA if he had been a lg-avg pitcher (in FIP terms).
Posted by Guy on 12/31 at 03:05 PM
Page 1 of 1 pages