clock menu more-arrow no yes

Filed under:

Quick review: league-wide FIP distribution

New, comments

And some other thoughts about pitching talent levels

League Championship - Atlanta Braves v Los Angeles Dodgers - Game Seven Photo by Rob Carr/Getty Images

Hello, friends. About a month ago, we looked at the league-wide distribution of hitting outcomes, as expressed through wRC+. This is a follow-on that focuses on the flip side: what is the league-wide distribution of pitching outcomes?

To talk about this, we’re going to focus on FIP, and more specifically, FIP-. As a quick refresher, FIP is a stat set to resemble ERA (i.e., runs per nine innings) that is driven only by strikeouts, walks, and home runs allowed. Especially for this exercise, FIP is probably the best comparable to wRC+, and measure of pitching outcomes available. Why is it better than anything else? Because it excludes anything having to do with fielding (or sequencing), letting us focus on pitching. At this point, you might say, “Wait, I don’t actually care about the distribution of FIP! I only care about the distribution of ERA!” Well, good news: FIP is set to the ERA scale, such that in any season, league FIP is going to equal league ERA by definition. Fundamentally, if you’re looking at the distribution of the skillset of the population, you’re not losing anything at all by focusing on FIP. FIP-, then, is just kind of the same as wRC+, but for pitching: wRC+ takes the hitter’s average hitting outcome and expresses it in terms of how much better or worse that is compared to league average, adjusted for park; FIP- takes the pitcher’s FIP and expresses it in terms of how much better or worse it is compared to league average, adjusted for park. Basically wRC+ drives the hitting part of fWAR, and FIP- drives the pitching part (with a few edge case differences, like how fWAR for pitchers also includes leverage for relievers and infield pops, which don’t factor into FIP).

Anyway, without further ado, you really just need this table:

Note that this table is all pitchers... and interestingly, while 831 humans threw a major league pitch in 2019, if you take just the top 575, the median FIP is 100. Even if you ramp up to the top 200ish, you’re already getting an FIP better than league average. To see why this is perhaps unexpected, check out this table, but put right next to the league-wide wRC+ distribution:

Out of 635 hitters in 2019, you need to drop around 43 percent of them to get the remaining ones to center around a league-average hitting performance. But out of 831 pitchers, you only need to drop around 31 percent to get the same result.

That’s not all, though. Look at the numbers in each column between the two tables. (Note that the column headings are proportional — they’re each the same share of 600 PAs or 200 IP, i.e., 50/600 basically = 17/200.) While it’s somewhat subtle, the pitching talent distribution seems to be tighter, basically across the board. At a 200-PA cutoff, the 40th percentile hitter is seven percent worse than league average, and the 60th percentile hitter is seven percent better, a 14-percentage point spread. But at the same cutoff for pitchers (67 IP), the spread is just nine percentage points (93 to 102, around a 50th percentile of 98). If you think about it, this tightness is intuitive: position players are employed for multiple reasons, such that some hitters are really there just to hit, some are there to not hit (less common these days), and some provide value in multiple ways. However, pitchers are basically just there to pitch, so you don’t get contamination of the group from guys that are there to... not... pitch?

In any case, you may have noticed that the tables above are somewhat unsatisfying, because we generally don’t think of pitchers as pitchers. (We will one day, but not yet, I guess.) Instead, we still have a world differentiated into starters and relievers, and there’s a theoretical expectation that these groups pitch differently. Except, well...

For a long time (this chart shows the past century of baseballing), relievers weren’t really a thing. A team’s best pitchers started and threw most of the frames, and the relievers were basically (much worse) reserves. Yet, over time, there were more relief innings, and the class of relievers got better (and closer to starter talent levels). This changed in the 70s, where we got a more “modern” pattern:

This chart reflects mostly conventional thinking at this point: teams keep using more and more relievers relative to starters, and the relievers pitch somewhat better. But! Look at the very tail of this chart! It’s basically stopped. 2019 was the first year where starter and reliever lines basically converged in decades. This makes sense, and not just in a trend sense: in a perfect, modeled baseball world, we would definitely expect that teams use starters right to the point where they can be subbed out for a reliever, and no further. We wouldn’t expect relievers to be worse than starters, because if so, you could bump off the worst relievers and replace their innings by using starters more. Nor would we expect starters to be worse than relievers, because if so, you could just take away their last few innings and give them to relievers. So we may have hit some kind of parity at this point.

This hopefully helps to clarify the tables below, which are, of course, based on 2019 — but don’t show any differential in aggregate performance between the two groups.

The cool thing about this table: the same ratio of 200/600 for hitters holds for starters to get a median starter with a league-average FIP. (67/200 is the same as 200/600, but for rounding.) So if you’re trying to judge how well a starter is doing in a various stat, aim for around 60-70 innings pitched as a starter as a cutoff to find comparables. (That was the point of this whole exercise.)

For relievers, which are more interchangeable, the right cutoff is somewhere between 11 and 22 innings. Which is not very many innings! Relievers are an ephemeral, dream-like class of humans. I’m not too sure why you’d really be interested in comparing relievers against one another anyway, but perhaps you’re a masochist. And if you are, now you know the right cutoff to use.