Everyday Stats and Their Flaws
Stats We're Talking About: ERA
While understanding how pitcher W-L records aren't helpful isn't that difficult, ERA takes a bit more explaining. W-L records incorporate more than just the pitcher's talent - defense, offense, bullpen, random variation, weather - and if it includes all of that, we can reasonably agree that other stats probably tell us more about a pitcher than that. That's where ERA comes in.
ERA is pretty simple. You take the number of earned runs allowed, divide by the number of innings pitched, and mulitply by 9 (ER / IP * 9). Because the objective of the pitcher is to prevent runs, this formula makes a lot of sense. The "earned runs" part takes care of the defensive miscues that cause additional runs. The "innings pitched" covers the amount of time the pitcher spent on the mound. And the "9" helps change the number into the amount of runs that pitcher will give up every 9 innings on average. That's the way it looks, anyway.
But it doesn't actually work that way. Let's look at defense first. Accounting for errors makes some sense because errors aren't the pitcher's fault, but there are times when the pitcher does earn the runs after it but got lucky the error happened with two, preventing them from hurting his ERA. And earned runs don't account for the actual quality of the defense.
Going back to the example I used last week, put a defense of Chris Johnson, Tyler Pastornicky, Dan Uggla, and Eric Hinske behind Tim Hudson. Now put Johnson, Andrelton Simmons, Elliot Johnson, and Freddie Freeman behind Hudson. With which group do you think Hudson has the lower ERA? The second, probably, but the change in ERA has nothing to do with the quality of Hudson's pitching. It has to do with the quality (or lack thereof) of defense behind Hudson. So while "earned" runs somewhat takes defense into account, it's only a small portion of what defense actually does.
The next issue is that the bullpen is still involved. If the pitcher leaves at the end of the inning, there's no issue. But what about when the pitcher leaves in the middle of an inning and runners are on? If the reliever allows those players to score, then the first pitcher receives all of the blame. While he certainly deserves some of it, the reliever does as well. Here's a list of starting pitchers and how many "Bequeathed runners" scored (BQS). Juan Nicasio, for instance, gave up 13.4% of his runs by inherited runners. Both pitchers deserve some blame, but only one receives it.
One of the bigger issues is the difference in parks. Having a 3.50 ERA in Coors Field is more impressive than doing it in Petco Park, but ERA doesn't really look at that. Depending on what type of pitcher you are, a park can really ruin you or help you, but it doesn't change the actual quality of your performance as the design and atmosphere of the park are out of your control.
Random variation comes in next. We discussed this a bit with BABIP last week, but luck and random events don't have to just "even out" over the course of a season. An abnormally high/low BABIP can affect a pitcher's ERA beyond just his skill - allowing more/less baserunners tends to have an affect on scoring.
Some of these are rather small issues (ie. bullpen), but others are larger problems (ie. park). Either way, the main point is that each of them detracts from the larger purpose of the stat - to tell us what the talent of a particular pitcher is. The issues we note dilute the stat. With each one, the stat becomes less about the pitcher and more about the run prevention team as a whole.
Nuanced Statistics and Why
Stats We're Talking About: FIP, xFIP, FIP-
We've discussed DIPS (Defense-Independent Pitching Statistics) Theory before, but let's do a quick recap. The basic idea is that pitchers have little control over what happens once the ball leaves their hand and is contacted by the batter. It's really up to the talent of the hitter after that, the quality of defense, and random variation. Instead, pitchers can influence the amount of strikeouts, walks, and home runs they allow through the quality of their pitches, the kind of pitches they have, and their general strategy.
FIP (Fielding-Independent Pitching) takes this theory to the extreme. It's formula is this:
FIP = ((13*HR)+(3*(BB+HBP))-(2*K))/IP + constant
The coefficients are derived through linear weights, and they weight the importance of each event. Home runs are really bad for pitcher (duh) as it's an automatic run. Walks are pretty bad as well as it's a free baserunner. Strikeouts happen more frequently than the other two, and it's the counteracting agent in the formula, which is why it's subtracted. Walks and home runs add runs, and strikeouts prevent them. As for the "constant" added on, it's usually around 3.20 depending on the year you're talking about, but it's simply the league-average ERA minus the league-average FIP without the constant. It's there to scale the statistic to mirror ERA.
It does this because we want FIP to be an ERA estimator. "Estimator" is the key word. The question isn't exactly "What should his ERA have been?" but "What ERA would we expect given this pitcher's peripherals?". They sound similar, but here's the difference. The first question makes too many assumptions. It assumes that DIPS Theory is absolutely correct, and it absolutely blames defense and luck for the difference between FIP and ERA. That's not really the case. It's likely the case, but we might be able to blame the pitcher for at least part of it.
No, what we really want to know is what we can expect from a pitcher given his strikeout, walk, and home run rates. The basic idea is this. Given that the pitcher repeats these peripherals moving forward, what ERA can we expect from him moving forward. By removing defense, random variation, etc., this statistic focuses solely on the pitcher and what he does. Because of that, FIP can tell us the quality of a pitcher. But the original purpose of FIP was to "estimate" runs moving forward and not necessarily to tell us "what happened".
It doesn't exactly tell us "what happened" because ERA is "what happened". FIP, however, is and can be used to suggest a pitcher's quality of performance in the past. This is where the "What should his ERA have been?" usually comes in. If FIP is using the performed K, BB, and HR rates, doesn't it also tell us about past performance? Sure, but we have to be careful of how we use it. Instead of "he should have had this ERA", we should say, "We would have expected his ERA to be this." Again, there is a difference. The former indicates an absolute truth that we simply can't state, and the latter reminds us that it's based on a theory, assumptions, and expectations.
Why is it scaled to look like ERA if ERA is so flawed? Again, it comes down to sample size. Over the course of 5+ years, the effects of defense, random variation, and bullpen get diluted, and the pitcher's performance shine through. It's not perfect because there are still some effects of context - if you pitch in front of an awesome defense for 5 seasons, it will still have an effect - but the effects have usually been minimized. FIP is still focused on run prevention, so as long as ERA's issues have been minimized, we want FIP to eventually match ERA, which it does for the most part by the end of a player's career. FIP is for seasonal talent, and ERA is better for long-term talent.
FIP, however, isn't completely perfect. We've talked about how extreme pitchers - really good/bad, sinkerballers/flyballers, and special pitch pitchers - begin to bend DIPS theory, but there's another slight issue - home run rate. For the most part, a pitcher will give up a home run on 10.6% of fly balls. Random variation, of course, messes with the actual rate in-season and even after a full season. If a pitcher has given up home runs on 20% of flyballs over 70 innings, that probably isn't an indication that he's going to be homer-prone. It may simply just be how things have worked out, and some of them could have been of the "Just Enough" variety. Over time, we expect a pitcher's home run rate to go toward 10.6%, which means both going up or coming back down.
This leads us to xFIP (eXpected Fielding-Independent Pitching). This is now our formula:
xFIP = ((13*(Flyballs * League-average HR/FB rate))+(3*(BB+HBP))-(2*K))/IP + constant
You'll note that it's pretty similar to FIP, but instead of a straight HR number, it's now adjusting for home run rate. The league-average home run rate (number of home runs divided by the number of flyballs) is usually around 10%, but it does change. The advantage of xFIP is especially in the short-term when most stats haven't stabilized yet - or become indicative of a pitcher's actual talent. Basically, xFIP is what we would expect a pitcher's FIP to be given his strikeout and walk rates along with a league-average home run rate. I realize that it seems like we're getting further away from what we want - FIP is what we expect ERA to be, and xFIP is what we expect the FIP to be - but let's go over how to use these.
ERA is flawed season-to-season because there are too many things that can influence it, but it does become handy once you get 5+ years of data and, especially, a career's worth. In order to season-to-season indications of a pitcher's value, you'll generally lean toward FIP. FIP will tell you what we expected him to do based on his peripherals, and what we expect him to do moving forward if he can maintain those peripherals. xFIP is for in-season or seasonal usage when a pitcher's home run is pretty wonky. This helps us readjust our expectations. Paul Maholm, for example, had a 13.8% HR rate, so his 4.41 ERA and 4.24 FIP are probably a little high, making his 3.89 xFIP a bit more indicative of his performance.
The issue with xFIP, however, is that pitcher's have different home run rates. Do not confuse this with flyball rates. They are different things. Tim Hudson and Mike Minor are both "expected" to have home run rates of 10%, but they will have different fly ball rates. Anyway, pitchers don't always have 10% flyball rates. Many of them will, but relievers and good/bad pitchers tend to deviate from that expectation. Things like little movement on the fastball, lower velocity, and specialty pitches will affect home run rates, so you just have to pay attention to the pitcher you're analyzing. Generally, xFIP can give you a pretty good indication, but you should look at a pitcher's career HR rate and his FIP to see if he can be an exception.
Finally, we would like for these statistics to adjust for league and park, which FIP and xFIP do not do. They are prone to the same issues that ERA is on that front. This is where FIP- comes in. If you'll remember how wRC+ works, FIP- works similarly. wRC+ adjusts wOBA to account for league, park, and era, and it has 100 as the average with anything above being above-average and vice-versa. FIP- also adjusts for league, park, and era, but while 100 is still average, you want scores below 100 instead of above. They do this because the "minus" focuses on the idea of run prevention. Therefore, you want a lower number, not a higher one.
Again, this is a lot, so let's recap. FIP uses peripherals - strikeout, walk, and home runs - to tell us what ERA we can expect from a similar pitcher or from a pitcher who can sustain those numbers moving forward. This is good for season-to-season values. xFIP adjusts for wonky home run rates and tell us what we can expect from a pitcher with certain strikeout and walk rates. FIP- adjusts for league, era, and park.
What We Have Left to Accomplish
FIP actually works the best of all the run estimators, but there are still some things people would like to see improved.
One, it's a bit extreme on DIPS Theory, and while it works the best on average, the extremes see it bend pretty far. We'd like a metric that could even tell us what to expect from these pitchers on the extremes.
Two, we might prefer a metric that focuses on all runs allowed instead of just earned runs. Pitchers usually share some blame for even unearned runs, and in addition to the weird rules on errors and earned runs, earned runs may not be the best basis for discussing pitcher talent, despite the best intentions.
But FIP generally works pretty well, and given how easy it is to search and compare, it's the go-to pitching stat. That doesn't mean you don't need to adjust for context, but it does give you a pretty good idea.
- ERA attempts to tell us about the quality of a pitcher, but it includes influences of the defense, park, league, bullpen, and random variation.
- FIP removes most of the outside forces, and it focuses on peripherals - strikeouts, walks, and home runs.
- xFIP assumes a league-average HR rate and tells us what FIP we should expect given certain strikeout and walk rates.
- FIP- works like wRC+ in that it adjusts for league, park, and era, but better numbers are below 100, instead of above like with wRC+.