Everyday Stats and Their Flaws
Stats We're Talking About: ... Luck
There's no real equivalent stat to what we know as BABIP (Batting Average on Balls In Play), but the idea of BABIP has been around since the start of the game.
Despite never really writing it down or throwing it in a formula, we've always readily acknowledged that things don't always work out as they should. A guy pops one into No Man's Land and gets a hit while the next guy ropes a line drive for an out. Someone hits a bomb to center, but it's in Marlins Park instead of Citizens Bank Park. The batter smashes one into the hole, but there's a shift on and a play is made on the ball. I could go on, but I think you get the point.
But while we've always readily acknowledged these points, we've always kind of shrugged our shoulders and assumed that everything evened out in the end. 162 games, 600 PA, and six months is a long time, and it would seem to make sense that things would even out. Sure, a hitter would get a few bloop hits, but he's also going to hit a few ropes that end up in outs. The pitcher might have some grounders that get through the infield tonight, but on another night, he'll get the breaks. It sounds good.
But again, things never really seem to work out so cleanly. We like to believe in ideas like karma in which everyone gets what they deserve, but by this point in our lives, I'm guessing most of us realize that, even if it does exist, it rarely works quickly. Maybe after a few years or decades, people will reap what they sow, but the punishments/rewards of our actions don't necessarily come swiftly.
They don't in baseball, either. 162 games is statistically arbitrary. Sure, there are reasons for there being 162 games - financial, weather-related, arbitrary - but the MLB didn't ask any statisticians - nor should they have - how long they should make the season so that all of the important stats would be statistically significant - or that they would fully represent a player's talent level. It's just one of those situations you can't do anything about. We want to know a player's talent/production for a season because that's the unit of baseball, but season length is constrained by finances, player health, and weather.
So as we start looking at BABIP, we'll discuss what we mean when we say "luck" or "random variation". We'll also examine the differences between how BABIP reacts for pitchers, for hitters, and for defenses. And we'll hopefully gain a better understanding of how this stat works between the main areas of baseball.
Nuanced Stats and Their Uses
Stats We're Talking About: BABIP, DefEff
BABIP, or Batting Average on Balls In Play, is relatively simple on its face. You know what Batting Average is, and this type of Batting Average simply concerns itself with Balls In Play - so things like K and HR don't influence it while sac flies and sac bunts are added back in. The equation looks like this:
BABIP = (H - HR) / (AB - K - HR + SF + SH)
Again, it's not too difficult. It's a slight modification of the H/AB formula of BA, but home runs and strikeouts are removed because the ball does not go in ball where the defense can make a play on it. Sacrifice flies and bunts are added into the denominator because the ball was put in play, and they need to be accounted for defensive purposes, especially. This handy-dandy stat can tell us a lot about each element of the game - hitting, pitching, and defense.
Let's look at pitching first. When Voros McCracken did his research on pitchers and came up with his DIPS (Defense-Independent Pitching Statistics) Theory, he noted that pitchers tended to have a BABIP in the .290-.310 range. You'll note that those averages are higher than the normal batting average, but we would expect that as a result of strikeouts not being included in BABIP - not including outs is going to make the average go up. He observed that, while this was true for a pitcher's career, BABIP tended to fluctuate from season-to-season for various reasons - defense, park, and random variation - but it usually ended up around .300 for the pitcher's career. If so many pitcher's BABIPs tended to end up in that range, then he postulated that it was not a particular pitcher skill, and as a result, the other factors we mentioned were to blame.
Defense should make the most sense to you. If I stuck an infield behind Tim Hudson of Chris Johnson, Tyler Pastornicky, Dan Uggla, and Eric Hinske, his BABIP would be much higher than if you subbed in Martin Prado, Andrelton Simmons, Elliot Johnson, and Freddie Freeman. Hudson wouldn't magically become a better pitcher based on his defense, but his BABIP would change given the caliber of the people making the plays behind him.
Parks can also alter a pitcher's pitchers BABIP. If you tend to give up a lot of fly balls, your BABIP would be different in Petco versus Coors. In Petco, more of those balls would turn into long fly ball outs - thus lowering your BABIP - but the fly balls would turn into more home runs in Coors - home runs don't directly affect BABIP's formula, but if balls are turning into more home runs, they aren't turning into outs, either. Having a massive amount of foul territory, such as in whatever Oakland's stadium is called this year, would also help more balls turn into outs than if you had less foul territory. Again, the pitcher gives up the same batted balls, but more of them turn into outs in different places.
And now, let's talk random variation and luck. Luck has this connotation of "didn't deserve" that isn't particularly accurate. Random variation makes more sense. The basic idea is like flipping a coin. Because there are two sides of a coin, we would expect it to land on heads 50% of the time. When looking at baseball, it's like stopping your count every 162 times and counting up the percentages. Sometimes, you'll get 50%. Other times, you might get 45, 52, 58, etc.%. The 162 delineation is arbitrary - we just chose it for no statistically significant reason - and as a result, you can't expect the percentages to work out perfectly every 162 times. But after 3200 times (or 20 seasons), my guess is that it ends up pretty close to 50%. This is what we mean by random variation - the theory works overall, but in the snippets you examine in between the beginning and the end, things won't always work so elegantly.
Of course, more research has since been done, and we know that every pitcher's BABIP doesn't tend toward the .290-.310 range. For probably 95% of pitchers, it does, but people at the extremes - the Cy Youngs and the Cy Yuks, the extreme sinkerballers and the extreme flyballers, and the knuckleballers or specialists - will tend to deviate from our expectations by 10-20 points. That's fine. But we need around 3-5 seasons of data before we can start concluding anything about it. Until then, we can suspect but not deduce. If a pitcher is not one of the special cases I mentioned AND they have an abnormal - high or low - BABIP for a season, you should suspect it isn't a representation of his skill, but you should be prepared that it might be.
Hitters are a bit different from pitchers. While a pitcher's BABIP tends toward .300, hitters do not necessarily. Most will probably be in that area, but there is a substantial number that do not. Freddie Freeman, for instance, has a career BABIP of .334. That doesn't mean that he fluked his way there. His high LD% (Line Drive%) of about 25% (the average is around 18%) helps him maintain a higher BABIP because line drives, as you might expect, tend to turn into hits more often than other types of batted balls.
Of course, it could go the opposite way, like with Andrelton Simmons. His LD% is about 18%, but he has an abnormally high IFFB% (Infield Fly Ball%). There's a misconception with IFFB%. It is NOT the # of pop ups/# of batted balls. It's the # of popups/# of flyballs. So while Andrelton has a solid LD%, more of his flyballs are routine outs, helping result in his .267 career BABIP.
There are other reasons that a hitter's BABIP might be different. A player who hits a lot of flyballs will generally have a lower BABIP because fly balls turn into outs quite often. A player with speed who hits more groundballs will tend to have a higher BABIP because they can turn some of those into hits. There is an overall offensive trade-off between them - hitting more flyballs turns into more extra-base hits while groundballs do not - but they will affect their BABIPs. A player like Michael Bourn may want to think about hitting more groundballs, but we would want Freeman and Jason Heyward to hit more flyballs because the extra home runs would be helpful even if it lowered their BABIP.
Over a hitter's career, they will tend toward a certain number. Using Freeman again, he has a high BABIP for his career at .334, but the .371 mark he hit in 2013 is almost 40 points higher. Chances are that Freeman didn't simply find another gear - his LD% didn't sharply increase - and his overall numbers will go down if something - lowers his K rate to allow more balls in play or adds more home runs - doesn't offset it. Chris Johnson is another guy as his .394 BABIP was ridiculous, but his .361 career BABIP means that we shouldn't expect it to completely collapse next season. With hitters, you need to allow a few seasons to see where a player's talent level is, and until then, use their batted ball profiles to see if a higher/lower BABIP makes sense.
The one catch to that is that a hitter can change or develop. So a player may start off as a certain BABIP-skill hitter, and he may improve/decline over time. To monitor these changes, look at a hitter's batted profile - the amount of line drives, fly balls, groundballs, and popups - to see if there have been any significant changes, and then ask if he's made any changes in approach or hitting mechanics.
Finally, we get to defense. Franklin uses BABIP quite a bit to talk about defense. The trick when talking about certain parts of a team's defense is to remember that different batted balls happen to different parts of the field. The infield will see more grounders and surrender more hits while the outfield will see more flyballs and more outs. When looking at overall defense, you'll here about Defensive Efficiency (DefEff or DE). It's basically just 1 - BABIP. DE tells us what percentage of the time a defense turns a ball in play into an out. Over the course of a season, there are a lot of plays that happen to a team's worth of defensive players, so the number is actually pretty good. Looking at 2013's leaderboard, you'll notice the Braves had a . 712 DE, which means they turned 71.2% of balls in play into outs or that opposing teams had a .288 BABIP against them. Either way works. It just depends on how you want to look at it.
Of course, parks can have an affect on a team's defense - foul territory, quirky features, etc. - which is why there is PADE (Park-Adjusted Defensive Efficiency). It makes some nifty adjustments, but it's not used very much anymore. People have tended toward UZR and DRS for their defensive needs, and PADE doesn't really answer a question we frequently ask. We are often more concerned with individual defensive talent, and BABIP, DE, and PADE are more for TEAM defense than individual defense.
So let's recap. Pitchers tend to have BABIPs near .300, but there are some exceptions here and there. Hitters will be more over the map, and you'll look toward batted ball profiles more than assuming a .300 mark. Total team defense will also use BABIP, and because of the sheer amount of plays for a full team, the numbers for a full season tend to be pretty good. Pitchers and hitters are just individuals, and there are many fewer plays to deal with, which is why it takes longer for us to be sure about their talent levels.
What We Have Left to Accomplish
There's not much for anyone to really improve on here. BABIP is pretty straightforward. The real breakthrough could be a more definitive xBABIP (eXpected BABIP) for pitchers and hitters. There are some decent ones out there, but they aren't easy to find for players. The ideal scenario looks at batted ball profiles, etc. to get an idea what other players with a similar profile have. And once you get that, you would have it in a way that makes it easy for us to see and compare players. And then, you would adjust their overall slash lines, wOBA, and wRC+ to reflect more of what we "expect" that player to do.
Why would we want this? It's going back to the difference between "talent" and "production". Freeman and Johnson produced a lot this past season, and you can't take it away from them. But that doesn't mean they were as good as their production - or that they can repeat that level of production. It's not a criticism. It just is what it is. But when a front office plans for next season, it would be nice to have a tool that gives us a better estimation of a player's talent, or what we should expect. Chances are that Freeman and Johnson benefited a little from luck, and it would be nice to also know what we should have expected given their batted ball profiles. Talent and production are both important answers, but they do tend to answer different questions.
- We tend to see "luck" evening out over the course of a season, but although we expect or assume it does, it doesn't really HAVE to.
- Pitchers tend toward a .300 BABIP, but if they are a special case - really good/bad, have a special pitch, or get a certain batted ball to an extreme - they may deviate by another 10 to 20 points.
- Hitters tend toward a career norm that is often the result of their batted ball profile, and while they tend toward .300 on average, there's more variation on the individual level.
- Because there isn't enough data in one season for the individual player, it often takes 3-5 seasons to start to see BABIP skill.
- Team defense also uses BABIP (or Defensive Efficiency) to tell us how many balls in play they turned into outs, and because there are a lot of plays in one season for an entire defense, the numbers for each season are pretty good.