clock menu more-arrow no yes

Filed under:

Talking Chop Baseball Analysis Primer: WAR

MLB: St. Louis Cardinals at Atlanta Braves Jason Getz-USA TODAY Sports

Imagine, for a moment, the days of yore. Some folks are gathered around a kitchen table, discussing major league baseball. One of them posits, “Grumps McSmashMan is the best baseball player in the game today!” Someone else disagrees. How are they to resolve this issue? If they’re both hitters, you could look at their stats and compare them. But what if they play different positions? Or, worse yet, what if one is a pitcher, and one is a hitter? How can you compare anything? But let’s say that somehow these folks came to a resolution and agreed that yes, Grumps McSmashMan was indeed the best player of his era. What’s to stop a third person from chiming in and saying, “Pah, baseball players these days are weaklings! He doesn’t hold a candle to the greats of yesteryear, like Santiago McOldTimer!” Can this issue be resolved? Sure, you could try to do the same thing to resolve disagreements about players active at the same time. But what if the game has changed dramatically between yesteryear and today? Is a 20-homer season really the same no matter when it occurs? And if the rate of homers changed over time, then it’s possible the overall rate of run scoring did as well. If two pitchers allowed the same number of runs per inning, but one played in an era where teams scored twice as many runs as in a different era, are those two pitchers really the same?

WAR does a lot of things, but for my money, the ability to settle these arguments is pretty much why it exists. That isn’t its only purpose, and it’s flexible enough to serve a lot of other very useful roles as well. But, primarily, WAR exists to boil down all the different things that a player can do on a baseball diamond into a single number, so that players can be compared, no matter what position they play, no matter when they played, no matter what their home park is, and so on and so forth. It’s not the only relevant number for assessing player quality, but its job is to be the number you’d take if you had to take just one. In life, we aren’t restricted to using just one number, of course. But as a shorthand, it does the trick. Without it, you’d be stuck comparing pitching metrics to batting lines and probably never able to climb out of that particular wormhole. (If you were able to climb out, it would be by creating some kind of equivalency in the form of a single number, like WAR ends up being.)

I’ve seen people get hung up on the words behind WAR: Wins Above Replacement. Both “Wins” and “Replacement” are kind of weird concepts. Replacement, by definition, isn’t something you see -- it’s the “other option” lurking in the periphery. And wins, well, wins are something a team does, not a player. So, what gives? Because of that, I think it’s helpful, if you are completely unfamiliar with WAR, to think of it first not as the words behind the acronym, but just as a word, or a symbol. You can call it “W” or “Cats” if you want, I don’t care. Just move past the concept of “Wins Above Replacement” for the time being, and think of it only as the number that gives you a way to compare all players. Once you’re in that mindset, here’s what to know about how WAR is derived.

First, there are always 1,000 WAR per season. Why 1,000? Well, you could say “eh it’s just a number.” 1,000 is nice and round, and when divided across the landscape of players in a given baseball season, gives you nice, single-digit WAR values to work with. In reality, there could probably be any amount of WAR up for grabs in a season, but 1,000 is used for a specific reason. We have to quickly detour to what “replacement level” means to see it.

Replacement level is a weird concept. For analysis purposes, it just has to be set somewhere. Say it’s dinnertime, you’re hungry, you want to eat something. You know all the restaurants in your town, and have a mental rating for each of them. In your mental rating system, you know that four stars is better than three stars, and if it’s a cardinal rating system, it’s 33 percent better. The meaning of quality for a four-star restaurant is directly tied into its number of stars, and that lets you compare it to any other restaurant with its own star rating. Functionally, if you wanted to, you could call a one-star restaurant the “replacement level.” That means every other restaurant would be some number of stars “above replacement.” That’s really all the mental gymnastics you need to think about replacement level -- it’s just some designated amount of quality. (If you wanted to, you could also set replacement level at staying home and cooking yourself, but either way, all it does is just give you some bar against which to compare every restaurant; where that bar is doesn’t matter much.)

For baseball players, replacement level is set at 0 WAR, which makes sense, because replacement level is zero wins above replacement. It’s also set such that a team of replacement-level players would win around 47.5 games in a season. Why 47.5? Well, it ties in to the 1,000 number above, you’ll see why in a second. Note that this is just a number. If you wanted a different replacement level, you could do it. In fact, Baseball Prospectus’ WARP metric, which is very similar to WAR in its aims and purpose, uses a different replacement level than 47.5 wins per season. The thing with WAR is that it really is “wins,” such that one WAR-win is meant to represent one actual-win-by-the-team-at-the-end-of-the-season. We’ll come back to that. But the thing is, take 1,000, and divide it by 30 teams. You get 33 and a third. Now add that 33.3 to the 47.5. You get right around 81 wins, i.e., a .500 team with the same number of wins and losses. (In reality, the number is not 47.5, but closer 81 less 33.3, it’s just easier to refer to it as 47.5 rather than 47.infinitedecimals.) In other words, if you spread the season’s available WAR evenly across all teams, you’d get a league of perfectly-matched, .500 teams, each with 81 wins. Basically, the 1,000 WAR figure and the 47.5 (it’s actually exactly 47.6ish) wins figure are two sides of the same coin. Is knowing any of this really necessary for understanding player value? Not really, but I bring it up because I think it highlights that WAR is pretty elegant and “feeds into itself” well.

There’s another reason why WAR is elegant, and in this case, it’s the elegance of utility rather the elegance of numbers adding up together. Luckily for me, Devan Fink summarized this for Beyond the Box Score not long ago: https://www.beyondtheboxscore.com/2018/12/26/18155292/correlation-war-wins-pythagorean-expectation-second-order-wins-third-order-wins

The basic idea is that WAR correlates with team wins very well. If it didn’t, we just wouldn’t use it. The whole idea logically follows like this:

  • WAR is a way of assigning player value
  • Under the premise of WAR, WAR has a one-to-one relationship with team wins at the end of the year
  • That is, you can take WAR for all of a team’s players, add it up, and add it to around 47.5, and you’ll get the team’s end-of-season win total (even though WAR doesn’t actually look at who won what games)
  • In the end, we find that this works really well!

Hence, this is why we use WAR. Now, “really well” isn’t perfect. And there’s no way it could be perfect, because as mentioned in that third bullet, WAR doesn’t use, as an input, whether the team actually won or lost any games, or even how many runs the team scored, or how many runs it allowed. If you take a step back and think about how cool it is that a stat that uses no direct game-state information (outs, inning, score, etc.) correlates almost perfectly with a stat that results entirely from game-state information, well, I don’t know what’ll happen. But it blows my mind every time I think about it. (For the persnickety among you, recall that even ratios of runs scored to runs allowed don’t correlate perfectly to wins, because it’s baseball, and things happen. Expecting perfection misses the point.)

Still not convinced? Try it yourself. Go to Fangraphs, pull down team WAR (you’ll need to pull down separate position player and pitching WAR and add them together), and compare each team’s WAR total to its actual win total over any amount of years. The correlation will always be very, very strong, almost perfect (but not quite). Again, if it wasn’t, then we’d be using something else instead of WAR as it currently is; we use WAR right now specifically because it’s a stat that allows us to compare all players to one another, while also giving us really good insight into how many games a team will win at the end of the year. In the end, aren’t those the two things you really want from a stat? (And if a stat only did one or the other, then it’s kind of unclear how useful it would be, since baseball is kind of rooted in the idea that better players lead their teams to win more games.)

Anyway, here are a few more things about WAR that are probably useful to know.

Some work a while ago determined that baseball teams, collectively, spent about 57 percent of their resources (payroll) on position players, and the remaining 43 percent on pitchers. Based on this, 570 of the season’s 1,000 WAR is spread across position players; the remaining 430 is spread across pitchers.

It’s too mathy to be useful in this primer, but the basic unit of analysis in baseball isn’t the win, but the run. The rule of thumb is around 10 runs to a win, though in reality, it tends to be a bit lower than that. There’s a whole body of work about converting from wins to runs, but functionally, it doesn’t matter much. It’s just like saying things in centimeters versus meters, because the basic idea is that you need some amount of run scoring and run prevention to win games, and you can talk about that as either the inputs in terms of runs, or the outputs in terms of wins, but it’s all the same.

The reason why this is kind of relevant is because replacement level for a player is defined as “20 runs below average” over a full season. Note that this isn’t specific as to how the player is 20 runs below average, it’s any combination therein. Again, this is just a construct -- think back to the restaurant example. A restaurant can be replacement level because the food is bad, or the service is bad, or there’s no parking, or it’s too expensive for what it is, blah blah blah. But the point is, you drill down to a single level, and in this case, that level is 20 runs below average. A great hitter who sucks at everything else can be replacement level. So can a great fielder who can’t hit at all. Or a pitcher. Conversely, a player that’s 20 runs above replacement is average, which is the same thing as saying a player that’s 20 runs below average is replacement. Remember that average isn’t replacement, the two are pretty far apart! Average players are decent, replacement-level players are, by definition, not decent.

WAR is a counting stat. The more you play, the more WAR you get. 0.5 WAR in 600 PAs is not very impressive. 0.5 WAR in 60 PAs is incredibly impressive. However, you should be careful in calculating WAR/600 figures (i.e., take total WAR, divide by total PAs, multiply by 600) because those figures assume that the player will play just like he already has for 600 PAs, which isn’t always a safe bet. Still, WAR/600 (or WAR/200 innings pitched for starting pitchers, or WAR/65 innings pitched for relievers) is useful for comparing players with different PA or IP totals, so long as you use it reasonably and not to extrapolate some guy’s hot week to the rest of the year.

For hitters, WAR is the sum of a bunch of run values -- hitting runs above average, fielding runs above average, and baserunning runs above average. Then, because there’s a difference between replacement level and average, the 20 runs of difference (per full season) are added back to a hitter to convert his “stats above average” into “stats above replacement.” For pitchers, since they don’t do different things (they just make pitch), the conversion is more direct -- it’s really just “how well did you pitch” and then “so what’s the WAR value of pitching that well?”

Right now, most people tend to mention either bWAR (also called rWAR, Baseball-Reference) or fWAR (Fangraphs) when referring to WAR. In some ways, these aren’t very different. In other ways, they are worlds apart. This table is incredibly helpful in sifting through the differences: https://www.baseball-reference.com/about/war_explained_comparison.shtml. For position players, fWAR and bWAR will be pretty similar. Where they’ll differ in large quantity is that bWAR uses one fielding metric, and fWAR uses another. The bWAR one tends to be more aggressive (higher highs and lower lows), the fWAR one tends to be more central. As a result, position players often have more extreme bWARs than fWARs if they’re especially good or bad fielders. Just be clear about which one you’re using.

For pitchers, bWAR and fWAR do completely different things. fWAR only credits a pitcher for things he was responsible for himself, that is, homers, walks, strikeouts (and infield pops, which are basically strikeouts because they require little effort to field). bWAR instead credits a pitcher for how many runs he allowed, and then modulates it up or down based on the defense behind him. I am partial to the fWAR method here, and find the bWAR method logically inconsistent. With fWAR, each play results in a credit to the hitter if he reaches base, and a debit to either the pitcher (if a walk or homer) or the fielder (if a non-homer hit); if the hitter fails to reach base, he gets debited and the pitcher gets credited (if a strikeout) or the fielder (if a groundout/fly out). In bWAR, this doesn’t happen -- both the pitcher and fielder get credit on an out in the field, and while the adjustment for team fielding quality attempts to back out any double-counting, it’s a lot less elegant and convincing. Plus, there’s no real reason to commingle them: under the principle of crediting players for what they do/don’t do, it’s not clear to me why pitchers should be credited or debited for their defense to begin with, even if there’s an adjustment later to try to “fix” it.

Neither fWAR nor bWAR currently take catcher framing into account, so you need to watch out for this if you’re valuing catchers; however, WARP, which is Baseball Prospectus’ WAR-type stat, does do this.

It more or less goes without saying that more WAR is better. But, don’t sweat small WAR differences. The general range for “eh this is more or less the same” is 0.5 WAR. So a guy at 3 WAR and a guy at 1 WAR are markedly different, a guy at 2.9 versus another at 2.7, well, it would feel weird to argue about that just on the basis of those two numbers and nothing else.

With that said, and as a reward for making it through this, here are some charts showing the WAR distribution in 2018 for any player with 200+ PAs.

Once you account for playing time, you can see that WAR is pretty normally distributed. The most common thing for a major league player to be is average, though when accounting for playing time, this means players will most commonly accrue only around 1 WAR.

For starting pitchers, using a cutoff of at least 70 innings pitched as a starter, the breakdown looks like this.

On a rate basis, once again an average starter is the most common outcome, though there’s been more of a trend of higher-quality pitchers throwing fewer innings as starters (compare the red bars with the blue bars).

And, for relievers, minimum of 20 innings pitched in relief, the breakdown is as follows. Basically, good relief help is hard to find, as unlike position players and starters, the mode reliever tends to be pretty close to replacement level, even on a rate basis. Also note that this isn’t a usage thing: the WAR/65 for relievers tends to have 0 be the mode value no matter where you draw the innings limit. In other words, even the most commonly-used relievers aren’t that good, as a group.

The remainder of this primer focuses on analysis of individual components. Some of these components feed into WAR, some don’t. WAR is retrospective, it tells you what happened -- when analyzing players, however, you don’t only want to look at the past, as looking forward can be quite informative.

tl;dr takeaway for WAR - WAR is great for comparing any two players, no matter what position they play or what era they played in. It also correlates incredibly well with explaining how many games a team wins at the end of the season. If you have only two seconds to assess how good a player is, look at his WAR.