(You'll have to forgive the absence of these posts over the last few weeks. The holidays are a busy time of the year, and they certainly were for me. But now we'll get back to our regularly scheduled programming.)
Everyday Stats and Their Flaws
Stats We're Talking About: ... clutch
This should be a fun discussion. Clutchness is one of the more controversial issues in baseball analysis. The reason probably stems from our fixation on heroes, on people who rise to the occasion despite the obstacles. It's a tale as old as time really, and while context and situations often dictate endings, we usually like to give success a human face. Facing a dominant closer in the bottom of the ninth inning is one of those occasions, and if a batter can get a hit in that situation, he's defined as clutch, not simply that a batter will often get out in that situation anyway. Of course if he doesn't, he gets the label of having "choked". But to be perfectly honest, being "clutch" is more of a tool for storytelling than for describing and analyzing players.
There have been several studies on the idea of being clutch, and while there's a mild effect at times, there's not enough of one to get excited about. And to be honest, that shouldn't be terribly surprising. When we talk about the very very top baseball players who have ever played the game, the differences in their skill isn't that great. And by virtue of having played hundreds and hundreds of games against stiff competition, they've been through enough big spots that they should have learned to handle them at some point. This isn't to say that there aren't exceptions, but everyone thinks their player is an exception.
And it's also not to say that some situations aren't different from others. When a pitcher comes on in the 9th inning, I bet there's more adrenaline, and I bet he's more focused. But chances are that a hitter is a bit more jacked up and focused as well, and the effects - again, given the .0001% of the population we discussing - probably wipe each other out. In these situations, we tend to see things from either the pitcher's perspective or the batter's perspective - but not both - and it's usually based either on our loyalties or the story we want to tell. For instance, Kimbrel comes through in the clutch because he's dominant and one of our players, and it can't be that the batter choked. Being "clutch" is often a matter of perspective, not truth.
But can't there still be players who perform better than others in these instances? Probably. But we want some sort of evidence for that assertion. One way is to look at the statistics, and those say that there's a fairly insignificant effect - players usually hit or pitch in high-leverage situations as they do overall. The other is to know what each of the players involved is thinking, and frankly, that's a fool's errand. We like to play armchair psychologists, but we really shouldn't. Outside of those two things, telling who is and who isn't clutch is a bit difficult.
Being clutch sounds good. It's a nice idea, and it's definitely a way to spin a story. But it's often used to tell a story that we want to tell instead of telling the story that actually happened. You can't simply say, "Well, I've seen him come through/not come through in big situations." I'm sure you have. Chances are, however, that every player has both come through and not come through in big situations, but we'll often remember the instances that reaffirm our thoughts and opinions on a person - it's called confirmation bias - and forget those that don't (or simply discount them as flukes).
In the end, we can look at hits or other events as being clutch because they are. That one moment can be clutch or unclutch because you're simply describing an event, but we have to be more careful using those words to describe players in that manner. If you do, you better bring some evidence - and not anecdotal - to the table.
Nuanced Stats and Why
Stats We're Talking About: WPA and LI
For the most part, nuanced stats try to be "context-neutral". This means that the statistic does not worry about what part of the game it happened, how many on runners were on base when it happened, or who was on the mound/at the plate. There's a good reason for this - players often can't control the circumstances they're thrown into. A batter can't force his way to the plate in the ninth. A manager can't put his best hitter at the plate with runners in scoring position, and it's tough to just stick pitchers in wherever (though it's obviously much more possible than for hitters). So the idea is to remove context and see how players perform overall.
Hoooweeevvver, we all know that context is often important as well - after all, we know that in-game situations different in pressure and impact. This is why there are multiple stats - they answer different question. For example, wOBA is context-neutral, and wRC+ takes context into account. Park factors, league, and era are the ones often cited in regard to nuanced stats, but those are more for long-term stats. The issue is that there are in-game contexts that also matter - I mean, it's good to know who came through and who didn't, even if it's not necessarily sustainable or indicative of much. That's where we get WPA and LI.
WPA (Wins Probability Added) should be fairly straightforward. Based on how games have played historically, there is a thing called "win expectancy" for each moment of the game. Given a certain score, what inning you're in, how many runners are on base, and how many outs there are, we can tell you what percent of the time you would expect to win - or what your win expectancy is - based on how teams have performed in the past. WPA is the difference in the win percentage between two consecutive events.
Let's look at an example. Player A comes to the plate with his team having a 57% chance to win, but he hits into a double play, dropping the win expectancy to 41%. His WPA would be .41 - .57 = -.16 WPA for that plate appearance. What you want to know is what the win expectancy is coming into the plate appearance, and what it is after. Then, you simply add up all of the plate appearances to get the player's WPA for the entire game. It works for hitters and for pitchers - you can add up each batter faced for the pitcher's WPA. Here is the WPA chart for the September 26th game so you can take a look.
Over the course of an entire season, most players won't accumulate much, but getting 1.00 is like 100% or an entire win. And they do accumulate, so there are players with seasonal WPAs over 6.0. One might think of it as the nuanced RBI. It's still not really indicative of future production, but it does tell us what happened, which is still a valuable question to answer.
LI (Leverage Index) is a little more complicated because the idea behind the math isn't as clear, but it covers a similar idea. Instead of win expectancy, LI attempts to quantify how pressure-filled a situation is. It's still based on the score, inning, number of runners and where they are, and the number of outs, and it helps us with the idea of clutch. An LI of 1.0 is pretty average, and anything above 1.5 is high-leverage. LI attempts to tell us how important the situation is.
So what's the difference? WPA analyzes the play after it happens. It takes a look at the win expectancy as a result of the play versus what it was before the play. LI grades the situation as the players go into it. WPA tells us how much a player changed his team's chances of winning, and LI tells us how difficult that situation was.
Now comes the fun part - WPA/LI. WPA, as we mentioned, is in context. But it's still not someone's fault if he comes up to the plate in easier or more difficult situations than his teammate did. WPA/LI takes out some of the context, but it gives us the impact value of the player, comparing how well he did in the situation to how hard the situation was. This might be a little difficult to firmly grasp, so let's use an example.
Let's say Player A singles in two runs in the bottom of the 8th and gets a WPA of .35, and the LI of that situation was ... 1.5. The next guy comes up and jacks a home run for a WPA of .15, but his LI was 1. It's not Player B's fault that he came up a player too late to get the LI boost. If we use WPA/LI, Player A gets a score of .35 / 1.5 = .233, and Player B gets a score of .15 / 1 = .15. .233 is still larger than .15, but it's not quite as substantial as the gap between .35 and .15. Player A still has a greater impact on the game, but it was mainly because of the situation he walked into. (Note: I made up all of those numbers, but I didn't feel like trying to find an example. Sue me.)
In the end, WPA/LI ends up as a context-not so neutral WAR, but 0 is average and not replacement level. So there's that important distinction. But this is ultimately how "clutch" is quantified.
So let's recap. WPA tells us how much a player helped or hurt his team's win expectancy. LI tells us how pressure-filled the situation was when the batter came to the plate or the pitcher into the game. And WPA/LI gives us a sort of impact a player has, like WAR but using game contexts.
What We Have Left to Accomplish
To be honest, there's not a whole lot left to really do here. Continual work will always be needed to adjust the WPA as run environments change and such, but it's pretty sound. The thing to remember is that, while it quantifies "clutch", it is not something we expect to be sustainable or indicative of much. It simply tells us what happened, but it doesn't really make much of a judgment about a player.
What I would be interested to see, however, is a seasonal WPA, LI, and WPA/LI. It's often said that a game in April counts the same as one in September, but that's not necessarily true for the same reasons a home run in the first isn't necessarily as valuable as a home run in the ninth. As the season goes on, a team's record and place in the standings becomes more clear, and the value of performing well late in the year becomes more valuable. In a way, hitting well in September during a pennant race should count more than performing early in April. More is simply on the line.
But let's be clear on what I'm saying. I'm not saying that a player is better than another because he performs late in the season. Unless one can prove he steps up late in the year every time, I'll say the same thing about him that I do about people who perform late in the game - it's nice, but it's not indicative of anything. But if a player performs well for a division champ down the stretch, that performance is worth more than someone else's who played for a team at the bottom of the standing. You don't make a front office decision based off it - again, unless it's a proven thing and proven to be substantially more valuable - but it could tell us the impact a player had on a team's season.
Even if that were to come into existence, it's still not Mike Trout's fault that his team sucks, and it's not a tale of heroics that Miguel Cabrera had the best pitching staff in baseball to work with. So one has to be careful what you would use this for. It's not necessarily an indication of true talent, but it would tell us the impact a player has had on that team's season. Again and as I hope you've noticed so far, it really depends on the question we're looking to answer. Different questions merit different answers.
- "Clutch" is more of a storytelling tool than anything. It's often used to tell the story we want to tell than the one that actually happened. We aren't in a player's mind, and we shouldn't pretend to be.
- WPA (Wins Probability Added) tells us how a particular player or event impacts a team's win expectancy.
- LI (Leverage Index) tells us the pressure or difficulty of a situation.
- WPA/LI removes the leverage and tells us the impact a player had on his team in context, and it acts as a sort of WAR with win expectancy.