As discussed earlier, when thinking about reliever value, it's insufficient to strictly consider the rate at which they give up runs because some runs are more valuable than others. Closers, in particular, tend to pitch in high leverage situations, and therefore should get more "credit" for their ability to pitch above reliever replacement level than a pitcher who only pitches in games that have a lopsided score.
For players since 2002, we can get actual pLI data from FanGraphs, and I discussed how to employ those data to adjust reliever run value estimates previously. However, what if you want to look at reliever value among players who played prior to 2002, like in my proposed series on past winning Reds teams? In that situation, you'd need some way of inferring reliever usage from other statistics.
One way to try to do this is by looking at performance--better pitchers should be used in higher-leverage situations. However, when attempting this approach, I've found that there's just very little predictive power (i.e. huge amount of scatter), even though there is a significant relationship between ERA (or FIP) and pLI. Whether that's due to within-team competition, inconsistent reliever performance, or poor decisions by managers, performance is just not a very good way to predict pLI.
On the other hand, as Darren implied, even in historical databases like Lahman's, we have at least one statistic that tells us something about usage: saves. Saves are well documented to be a rather poor indicator of reliever quality. Nevertheless, they do tell you who was pitching in the 9th inning of a team's games, which tends to be the inning with the highest leverages. So we should be able to use saves to infer something about reliever usage. Here's what I did:
I pulled stats, including both traditional pitching statistics and pLI, from fangraphs on all pitchers, 2002-2007, who threw at least 25 innings in relief in a season. There is some selection bias in such a sample, because it will tend to exclude a lot of bad pitchers who weren't given the opportunity to throw 25 IP. But it still does include pitchers that span much of the range in terms of performance, and gets around the issue of dealing with stats on pitchers with extremely small samples (not that 25 IP is a big sample...).
Next, I calculated saves per inning (Srate) as an indication of the proportion of a pitcher's innings that were associated with saves:
Srate = Saves/IP
It's important to use a rate because you want to know something about a player's opportunities. If someone gets 20 saves in 20 innings, they're probably pitching in much higher leverage situations, on average, than someone who gets 20 saves in 70 innings. Ideally, I'd also use blown saves--and maybe holds--but those stats are not available in the Lahman database or on baseball-reference's team pages, so I'm going to ignore them for now.
I also converted to pLI to a "rate" statistic using the approach suggested by Tom Tango:
rateLI = pLI/(pLI+1)
rateLI = pLI/(pLI+1)
pLI = 2 ---- rateLI = 0.667
pLI = 1 ---- rateLI = 0.500
pLI = 0.5 ---- rateLI - 0.333
This was important because as a pure ratio, pLI changes at a faster rate above 1.0 than it does below 1.0, which makes it hard to model using a regression-based approach.
Anyway, here's a plot of Srate vs. rateLI:
Obviously, that's a pretty ugly-looking relationship down in the zero/low-saves groups. But as you can see, there's a pretty nice relationship among pitchers who actually have a modest number of saves and their pLI. In other words, once someone starts to get saves, you can reasonably predict that he'll have an above-average pLI, and the player's pLI should steadily increase from there.
I decided to run with this and, in what I completely admit is a really terrible abuse of regression math (I've violated just about every assumption one can violate), I fitted a line to this relationship. I found that a second-order polynomial seemed to fit the data well. Furthermore, I forced the y-intercept to come in at a rateLI=0.5 (pLI=1.0), such that the average pitcher without saves is expected to pitch in average leverage (otherwise, the equation tended to predict that the vast majority of pitchers would have a pLI=0.8, and that's not reasonable). Here's the equation:
rateLI = -0.3764*(Srate^2) + 0.5034*Srate + 0.5
which we can convert back to pLI by:
pLI = rateLI/(1-rateLI)
Now, this rather shaky regression equation isn't something that I'd try to publish in the SABR newsletter, much less an academic journal. It's not built upon rigorous math. But it actually works pretty darn well. For demonstration, here's a table showing a hypothetical pitcher who has thrown 70 innings, and how his predicted pLI changes as the number of saves (and thus his Srate) increases:
|Saves (70 IP) ||Srate||rateLI||pLI|
Anyway, I think that this is a pretty reasonable way to adjust for historical reliever leverage, at least among closers. Obviously, we're going to undervalue some relievers that aren't yet in the setup role but pitch in lots of big-time leverage situations in the 7th or 8th innings. But I think this approach will capture a lot of what we're trying to do with a reliever leverage adjustment.
On a moderately related note...last night, I spent some time setting up spreadsheets and my database to start on the Winning Reds historical series. Should be pretty efficient at this point, which should make it easier to get through the teams at a good clip as long as I keep the writing under control. I'm excited to get started on the series, but I think I'll do a dry run first in wrapping up the 2007 Reds' season. Look for that shortly.