I cringe when I read the word regression. If a player is outplaying his peripherals, then he is lucky. Sooner or later, he will come back to earth. Or so I have been told. It is, as Kelly Bundy put it, as inevitable as death and Texas. For the most part, that’s what happens. And you have to be careful about making quick assumptions, especially in small sample sizes. But sometimes there are other forces at work that we don’t understand that have more weight than luck.
Johan Camargo was supposed to come back to earth in 2019. After all, he outperformed his xwOBA by .040 last year. “So this year, he’d underperform by .040 to balance the universe, correct?”, one might say. Well, as of September 11th, he is outperforming by .002. But the universe is punishing him with a bad year because of his good one, right? I mean, maybe, but there are other things in play. Last year, he was lucky and good, especially with line drive home runs. This year, he didn’t adjust to a bench role this year, he doesn’t have a traditional power cross in a season of big home run numbers, and he suddenly couldn’t hit breaking pitches.
Kevin Gausman was a player that was unlucky this year. His ERA was a full two runs over his FIP. Given enough time, the universe should balance itself, right? He should start performing well again, correct? Well perhaps, but his FIP/ERA ratio wasn’t told about his slider and splitter. You see, the slider is poor and the split-finger crept up in the zone. His split-fingers thrown in the gameday 1-6 zones (middle or high strikes) caused an underperformance of .177 in his xwOBA/wOBA ratio. That should balance out, until you think about it. His splitter is a 83 MPH pitch with not a lot of break thrown over the plate. What do you expect would happen? With 95+ MPH pitches, Major League hitters rely on launch angle and picking a field. But a high and slow breaking pitch, they can guide the ball wherever they like.
Estimators like xwOBA and FIP are very helpful in determining true talent level. Expected weighted on-base average is probably the best advance in sabermetrics that I have seen. Announcers used to say “he’s 1 for his last 20 but he’s really hitting the ball well lately.” Using xwOBA without seeing the hitter, you can say things like, “in his last 25 PA, his wOBA is .079 but his xwOBA is .379 so his swing is probably ok.” But like any estimator, there are edge cases and instances where it is not perfect.
You can apply an estimator to a team won-loss record as well. Bill James figured out that if you apply the Pythagorean theorem to a team’s runs scored and allowed and come up with a pretty good estimate for a team’s won loss record. He noticed that run scored squared over the sum of runs scored squared and runs allowed got pretty close to estimating a team’s record. Later, it was discovered that if you change the exponent from 2 to 1.83, you get better results. It will get most teams’ records to within 5 games.
The importance of attaching runs to games won is one of the foundations of sabermetrics. It is intuitively obvious that runs are very important to winning games. Most if not all of the advanced metrics that be salted down to the importance of the ability to score or prevent runs. Not necessarily the results stats, like RBIs, but measuring the attributes that will score runs over the long term. Having these attributes will score or prevent more runs over a season, and thus win more games.
But Pythagorean run record estimation has its limits. It doesn’t do as well with very good and bad teams as it does with average teams. Since the league began divisional play in 1980, 71.5% of playoff teams outplayed their Pythagorean run record that season. 75.7% of all World Series winners did as well. (Ivan: this is basically direct selection bias at work, at least partially, as a team underperforming its run differential is less likely to make the playoffs in the first place.) In that time period, 64.7% percent of teams finishing fourth or worse underperformed their Pythagorean run record (not including those that finished in the middle or fourth out of 7). Among teams that finished in the middle at third out of five or fourth out of seven, almost exactly half (50.3%) underperformed and almost exactly half outperformed. So the Braves outperforming their Pythagorean run record is probably a good thing, as it falls in line with most playoff teams. I would rather the Braves won-loss record run a little hot than not.
Basically playoff teams outperform, bad teams underperform, and middle teams are in the middle. What can we learn from that? Two ideas:
1) Playoff teams might excel at doing the small things well. They do the small things that fall through the cracks of estimation. They might play a little better in high leverage. Their managers might be a little cageier in close games. Bad teams probably do the opposite.
2) The formula might need to be adjusted for overdampening. The Pythagorean approach doesn’t account for edge cases well. This formula might need to be adjusted if you only have two inputs, runs scored and runs allowed. But any adjustment could lead to overfitting as well. It does what it does well enough.
The answer is probably a little bit of both. Playoff teams do seem to perform well against lousy teams and pull out improbable games (see Nationals v. Marlins and Mets). And the formula has trouble at both ends so an adjustment is probably overdue. I have been toying with using Pythagorean (home run) record with an exponent of 2. It’s not as accurate, but pretty good. The numbers produced in relation to overperformance with it fall in line with what was seen with traditional Pythagorean.
The Braves have made 18 postseason appearances since 1980. 12 times they outperformed their Pythagorean run record. Of the 10 teams that reached the World Series or League Championship Series, they underperformed their Pythagorean HR record all 10 times. Right now the Braves are overperforming (.615 actual record vs .600 HR-Pythag estimate). So by those two metrics, the 2019 Atlanta Braves look like a team that will make the playoffs and lose in the first round. That is only an estimate, but seems believable.
By the way, these estimates assume that a MLB team is a monolithic structure that doesn’t change over the course of the season. The Braves’ rotation is much better than it was Opening Day, with Kevin Gausman, Sean Newcomb, and bad four-seamer/amoeba slider Mike Foltynewicz out and Dallas Keuchel, Mike Soroka, and good two-seamer/less-amoeba-slider-maybe Mike Foltynewicz in. The 5-8 spots in the lineup have been poor (outside of Matt Joyce) while awaiting two injured players and unlucky-and-worse-since-return Dansby Swanson. The bullpen has largely been a mess but solid at the back lately. Judging by Brian Snitker’s selections when playing the Phillies, he seems to be evaluating the front half of the bullpen for the “hottest hand.”
I try to enjoy baseball and have a beer. Because it can remain irrational longer than you can remain sober. If the Braves start to lose big games down the stretch and/or playoffs, it won’t be because of the regression “hand of God.” It will be other baseball reasons such as their back five in the lineup continuing to play poorly and their bullpen imploding.