clock menu more-arrow no yes

Filed under:

Blast from the past: Looking for a Leo Mazzone effect

New, comments

The results are somewhat complicated, but suggestive.

Braves v Dodgers Photo by Jeff Gross/Getty Images

The best things in life come, as they often do, from the comments section. With respect to judgments cast on pitching coaches, this is no exception. Pitching coaches themselves are a slippery topic. Who among us can really say, with definitive clarity, “Yes, the pitching coach caused this to occur!” My own inclination is generally to credit the players for pretty much everything. In the end, the players are the ones executing, and in a paradigm where we attempt to quantify and value everything, that value flows back to the players. There’s 1,000 WAR in a season to be distributed across players in the box score; we don’t, as yet, have a framework for separate “coaching reservoir” of WAR that would get its own set of numbers. (And if we did, that’d make the current relationship between player WAR, team WAR, and team wins more complicated, for sure.)

Leo Mazzone is pretty singular as far as pitching coaches go. If you’re a Braves fan, you’re probably aware that he spent over 1.5 decades in that role in Atlanta (1990 through 2005), which seems outlandish in today’s era of non-traditional coaching arrangements, rapid managerial turnover, and diversification of strategic responsibilities to Front Office and baseball operations staff. You’re also probably aware that Mazzone’s legacy is perhaps something akin to one of the (then) best in the business. The question crops up naturally: was this deserved, or was it the product of Mazzone, or the fact that he happened to be nearby as a dominant pitching syzygy aligned in Braves uniforms in the 1990s? Unfortunately, in said comments exchange, an answer to that question wasn’t forthcoming. If I had my druthers, I could just ask questions and get the answers to them. Of course, with said druthers-having, I would also have delicious food materialize in front of me without having to cook. Alas, that’s not the universe I live in. So, off I went.

TC commentariat member casmith12 provided the first inroad into a potential investigation, linking an SB Nation longform article by Tony Rehagen that’s well worth a read if your interests are “Leo Mazzone,” “pitching coaches,” or “good baseball writing.” What really interested me (as brought up by casmith12 as a very salient point) therein was this quote:

Economist and Sabernomics blogger J.C. Bradbury looked at the park-adjusted ERAs for 98 pitchers who had thrown at least 30 innings in a season for Mazzone and compared those seasons to those pitchers’ numbers for other teams and found that Mazzone lowered a pitcher’s ERA about .63, more than half a run. Bradbury’s conclusion: “Starters and relievers pitched worse both before and after playing for Mazzone.”

That led me to find more on that particular analysis, which segued into a New York Times article containing the relevant result (this one is much shorter so don’t be afraid to click even if you’re pressed for time). So, in some ways, this could represent an endpoint. There you have it: Mazzone lowered a pitcher’s ERA relative to either pre-Mazzone or post-Mazzone performances by the same pitcher. The end, right? Well, it could be, except for one tiny idiosyncratic thing about me: I don’t care about ERA. This isn’t the space for a screed about all that’s wrong with ERA, but suffice to say, that didn’t prove satisfying to me. If someone had said something about strikeout rates or walk rates, I’d probably be content. But nope, ERA was all that’s there. Given a very unsatisfying answer, then, I sought out to do my own quick look at the Mazzone effect. The rest of this article is what I found.

The setup

If ERA is unsatisfying, the presumption is that something else exists to serve as a better answer. If you know me at all, you’re probably aware I’m going to choose something like FIP, and that awareness is correct. FIP takes sequencing, defense, and some kinds of luck out of the equation. It doesn’t filter out everything (whether a fly ball becomes a homer, catcher framing, and umpiring are all things that affect pitchers not controlled for by using FIP), but it does get the big stuff. In addition, FIP exists in a minus format (that is, FIP-), which scales an individual number like 4.15 to a scale where 100 is league average, and each point below/above 100 reflects run prevention that many percentage points better or worse than league average. In addition, FIP- is park-adjusted, which helps control for differences in run prevention associated from playing in different stadiums. (Is it stadiums or stadia? English is weird.)

(Initially, I also wanted to use xFIP as another point of analysis, but it relies on batted ball data and is therefore not available prior to 2002 in any easily-accessible way.)

For the analysis, I focus specifically on Mazzone’s Braves tenure, which lasted from 1990 to 2005. While Mazzone also worked as the Orioles’ pitching coach in 2006 and 2007, the amount of pitchers he encountered over those two seasons is a pittance relative to those he worked with in the 16 prior years, and I didn’t want to create any implication of biasing results by incorporating a situation that was anecdotally not fully agreeable to Mazzone or his coaching system into my data. Bradbury focused on all pitchers with 30 or more innings thrown under Mazzone, whereas I did something a little different:

  • First, I focused only on pitchers with 60 or more innings both under Mazzone and under another pitching coach. I didn’t want small samples to affect my comparisons, especially since homers tend to be a rare event with substantial impact on FIPs in the small samples where they either occur a lot or don’t occur at all.
  • Second, I excluded some pitchers that repeatedly bounced in and out of Mazzone’s coaching in the mid-1990s. These few players defied basic “pre/post” structures, and while a semi-rigorous regression analysis could handle them easily, what I wanted to display as an endpoint was a non-black box result but something with basic arithmetic.
  • Third, I didn’t exclude any pitcher so long as he met these two criteria. That is, if a pitcher either never had a different major league pitching coach before Mazzone, or retired after working with Mazzone, I didn’t kick them out.

The end result: 50 pitchers, even, with careers spanning 1976 through 2015. These pitchers worked with Mazzone as early as age 20 and as late as age 43, and at least one pitcher worked with him in every season between 1990 and 2015.

The data collection was fairly simple: I separated the careers of each of these 50 pitchers into three buckets: the Mazzone bucket, the pre-Mazzone bucket, and the post-Mazzone bucket. For each, I grabbed the FIP-. That’s really it.

A first, basic look

Yes, it’s possible to do all sorts of statistical stuff with these data. But we’ve got a sample size of 50, we’re spanning multiple eras of baseball, and we’re going to skirt right past the fact that any Mazzone Effect probably didn’t come fully-formed on Opening Day 1990, but probably got developed over time (if it exists at all). So, let’s just talk about basic numbers. In cutting up every player’s career into these Mazzone/pre-Mazzone/post-Mazzone buckets, we can easily compare the effect on FIP across them. In doing this comparison, I found the following:

  • Of the 50 players, 26 (so just over half) did “better” by FIP- under Mazzone than at another point. In this case, I’m broadly defining “better” as an FIP- gap of 10 or more. So, for example, Greg Maddux, who had a 68 FIP- under Mazzone, an 86 FIP- before Mazzone, and a 91 FIP- after Mazzone counts in this category.
  • Another eight players (16 percent) had ambiguous results. These constitute the cases where the effect isn’t consistently present. Fore xample, John Smoltz went from a 99 FIP- pre-Mazzone to a 77 FIP- with Mazzone, but then after Mazzone, he finished with a 75 FIP-. The effect isn’t consistent. Or, you have someone like Charlie Leibrandt, who showed the right direction in terms of Mazzone presence/absence, but not a big enough effect (97 FIP- pre, 88 with, 95 post-).
  • The remaining 16 players (32 percent) did not have lines that showed an effect in this regard. For example, Horacio Ramirez had a 118 FIP- under Mazzone’s tutelage, and a 110 FIP- for his career afterwards (he never pitched in a major league game prior to his Mazzone exposure).

Below are tables for each of the three.

This is the “better” group. The size of this group is pretty telling in terms of there being some kind of effect. Having 26 of 50 pitchers qualify for this group suggests that if you picked a pitcher that worked with Mazzone at random, he had even odds of benefiting substantially from his tenure under Mazzone in FIP terms, both above what he did before as well as what he did after. That’s pretty big. Without weighting by innings, this group had a 104 FIP- before Mazzone, an 84 FIP- under Mazzone, and a 105 FIP- after Mazzone.

This is the mixed bag group, and you can see it’s a mixed bag for a few different reasons. Smoltz didn’t get worse post-Mazzone but did get better relative to pre-Mazzone. Millwood didn’t have a pre-Mazzone period but didn’t get too much worse after. Mike Hampton did get notably worse after, but didn’t really improve relative to his pre-Mazzone self as a Brave. And so on. Some of these, like Antonio Alfonseca, are likely edge cases and I’m not going to argue much if you want to move them to the “better” group, but it’s pretty clear that even with trying to hedge against there being an effect, there’s potentially something there. In terms of before/during/after splits, the FIP-s are 99, 93, and 107, respectively.

Lastly, we have the “no effect” group. There’s something pretty interesting here: see all the gray in the middle set of columns? That reflects players who basically started his career with Mazzone. Of our group of 50 players, 19 of them did so. For all of the other players (31 of them), only seven showed “no effect” — under a fourth. For these 19 Mazzone-experiencing youngins, the count is nine, or about half. The interpretation, if you want to lean that way: whatever Mazzone was doing worked less well with young pitchers, at least relative to the rest of their careers. Aside from Mike Stanton, Jason Schmidt, and Derek Lilliquist, the other young arms that came up under Mazzone generally did not show the same effect on their FIP that we saw elsewhere. Similarly, of all the pitchers to come under Mazzone (all the ones with gray shading in the middle columns), all of the ones to show a definite effect were early/mid-90s debuts, except for Australian Damian Moss. (Moss, for his own sake, definitely experienced a Mazzone effect in that he was below average in his rookie season but beat his peripherals handily; after being traded for Russ Ortiz, he became essentially unplayable and threw fewer major league innings in the rest of his career than in his rookie season under Mazzone.)

So, there you go. If you’re so inclined, here’s your Mazzone effect — lots of pitchers were better, even by FIP. Not all of them were, and in particular, young pitchers towards the end of Mazzone’s tenure didn’t seem to benefit the way other hurlers did. But in terms of a broad-based effect, it’s tricky to argue with pre-/during/post- aggregate FIP-s of 100/94/105, when you account for the fact that the average age changes from 27 to 29 to 33 in these buckets as well.

But what about aging?

Here’s the thing, though, as alluded to that in the previous paragraph: pitcher skill isn’t static over time. Much like position players, pitchers age. But nailing down aging amidst changing coaches is a very dicey concept. I tried to address it in an overly simplistic, quick-and-dirty way. Here’s what I did.

First, I started with pitcher aging curves as noted here: https://www.fangraphs.com/blogs/pitcher-aging-curves-starters-and-relievers/. I used the FIP line on the “starter” chart — reliever aging is apparently incredibly aggressive, and experimenting with it rendered most of the rest of this analysis ridiculous, even for pure relievers in the sample.

I then arrayed FIP against FIP- for the entire period of any pitcher of the group of 50 in the sample — 1976 through 2015 — and found that the average relationship was about 18 points of FIP- for every 1.00 increase in FIP. Because the aging curve in the article above didn’t extend beyond age 37, I just extrapolated the basic trend once FIPs started worsening all the way out to age 43, the oldest year among pitchers in the sample. In case anyone cares, the relationship looks like this:

(Remember, kids, on average, players only get worse, not better, over time.)

Once I had this, the remaining analysis was fairly straightforward. I just took the average age of each pitcher in each stint (i.e., pre-Mazzone, Mazzone, post-Mazzone), found the average baseline FIP- for that age range, and then compared it to the actual difference in FIP-s across the stints. Since that was a lot of words lacking in clarity, what I really mean is:

  • Take Juan Cruz. Cruz pitched his age 23-25 seasons before Mazzone, which yields a baseline FIP- of 97. With Mazzone, he pitched his age 26 season, for a baseline FIP- factor of 99. The difference between these, i.e., the expected difference due to aging was an FIP- increase of 2. In reality, though, Cruz’ pre-Mazzone FIP- of 102 dropped to 85 under Mazzone, so while the expected change was +2, the actual change was -17.
  • On the other hand, though, Cruz’ post-Mazzone career spanned ages 27 to 34, with an FIP- factor of 109. The difference between this and his Mazzone factor is an expected increase of 9-10 points of FIP-. That’s more or less what happened: his 85 FIP- under Mazzone was 96 post-Mazzone.

I bring up Cruz specifically because in the tables above, he was in the first, “better” group. His pre/Mazzone split was 17 points in favor of a Mazzone effect; his Mazzone/post split was 11 points in favor of such an effect. Yet, with this age adjustment, we see that while that 17-point split was pretty profound in light of only a small aging effect, his 11-point split between his Mazzone-level performance his post-Mazzone performance was pretty much directly in line with aging.

This sort of relationship tends to hold throughout, and gets fairly complicated pretty quickly. Comparing only pre-Mazzone to during-Mazzone splits, we have an average expected FIP- change due to aging of +11, and an observed average FIP- change of -9. That’s pretty huge — the Mazzone Effect appeared to, on average, reverse aging when grabbing hold of a pitcher. But, in terms of comparing during-Mazzone to post-Mazzone performances, we have a similar averaged expected FIP- change due to aging of +11, and an observed average FIP- change of... +12. There are potentially plausible explanations for this. One that quickly jumps out at me: Mazzone was a teacher, and he taught pitchers how to succeed, even after they left his tutelage. He brought them to a better baseline, and while they couldn’t stave off aging (it comes for us all), they didn’t suffer specifically because they were no longer working with him. Still, that’s just speculation; these data can’t really prove this, and I’m not sure they really even suggest it. (My wife calls this, in other contexts, fanwanking.)

Back to actual summaries of the data, though. Among the 31 pitchers that had pre-Mazzone careers, 23 of them had a fairly clear actual FIP- difference commensurate with a Mazzone effect in excess of their expected FIP- difference due to aging. The eight that didn’t don’t really have much in common, though it is potentially interesting that most of them (all but Pete Smith) joined the Braves in their late 20s or 30s, and that none of them worked under Mazzone for more than a year or two (again, except for Pete Smith).

Among the 43 pitchers with post-Mazzone careers, though, the results are more muddled. Only 12 had clear evidence of a Mazzone effect in excess of expected aging; the remaining 31 did not, or had a more muddled outcome. Among these “nots” are the Braves’ Big Three, along with other notables like Kevin Millwood. If we combine both sets of comparisons for each pitcher, where “effect” means seeing a potential effect beyond aging in both pre- and post- comparisons where available, “no effect” means not seeing it in either comparison, and “mixed” means, well, a mix of both, we get the following breakdown: 15 showing an effect, 14 showing a “mixed” or possible effect, and 21 showing no effect. Of the 21 showing no effect, 14 (two thirds) started their careers with Mazzone, however.

So, there you have it. Just for general interest purposes, a different set of tables is below, hiving off the 50 players into three groups once again, and showing the expected and actual changes for each comparison for each. Feel free to draw whatever conclusions you’d like from these data. For my money, it does seem like there was some there there as far as a Mazzone Effect goes, though its inability to hold up on the backend of pitchers’ careers when adjusting for aging is somewhat curious. Still, there are potentially enough plausible and/or fanwanked explanations for that that I wouldn’t write a Mazzone effect off out of hand.

So, there you have it. Now let’s all carry on having no idea how to evaluate pitching coaches. And maybe someone can do this for newly-hired Rick Kranitz as well!