Part 2b: Baselines
Ok, now that we've decided upon a runs estimator--linear weights derived from a BaseRuns model that fits our particular context--we need to think about how exactly we're going to go about assessing player value. There are four typical ways that folks use offensive runs data to assess player value or performance: absolute runs, runs per game, runs above average, and runs above replacement.
To anchor our discussion, below I've created a table that reports values for the 2007 Reds based on each of these approaches. Runs values were estimated using the linear weights for the '03-'07 National League reported in the previous article, and replacement level was defined as 73% of position player league average, with no positional adjustment (more on that later):
|Absolute Runs||Runs Per Game||Runs Above Average||Runs Above Replacement|
|Phillips||702||97.9||Votto||89||7.1||Griffey Jr.||623||15.9||Griffey Jr.||623||36.4|
Alright, let's walk through these rankings one by one.
Absolute Runs: An estimate of the absolute number of runs produced by a player.
Absolute Runs are just the direct output of our linear weights equation (see here for overview of linear weights).
The top of this scale looks appropriate, as all of the Reds' best offensive players are up near the top of this chart. And the bottom seems appropriate too, with players who weren't around much bringing up the rear. But to my eye, there are some issues with how players are ranked near the middle.
With an absolute scale, any time you do something positive on the ball field, you're rewarded. The result of this is that players that get more playing time, almost regardless of their production in that playing time, will tend to create more runs than players with less playing time. In fact, the correlation on these Reds between PA's and absolute runs is 0.98! Now, it's true that, in fact, those players did do more total good things than players that are ranked lower. The problem is that they also might have done a lot of terrible things, like create lots of outs, which aren't fully accounted for in an absolute runs statistic.
Ryan Freel, for example, had a rather poor 0.655 OPS this season. That's well below league average, and is about the minimal level of production you'd anticipate from any scrub you call up from AAA. And yet the absolute runs estimate ranks his production as virtually identical to Javier Valentin (0.715) and Jeff Conine (0.729). Now, those guys didn't exactly knock the cover off the ball, but they certainly had a better season than Freel did...and yet, Freel is rated as their equal because he had more chances, even though his tendency to create outs may have hurt the Reds more than he helped them.
A more controversial (to some) example comes at the top of the chart. Brandon Phillips is rated second on the Reds' team with just under 100 runs produced. And yet, his low OBP caused his OPS to be a fairly mundane 0.816 this season. Certainly not bad, but would you rank his offensive contributions above guys like Griffey (0.869), Hatteberg (0.868), or Hamilton (0.922)? Maybe you would; there is certainly value to playing as much as Phillips played (700+ PA's). But maybe not, as those guys were better producers on a per-PA basis than Phillips was. For example, it could be the case that combining Griffey's plate appearances with plate appearances from an AAA scrub (DeWayne Wise?) would result in more production Phillips' production alone in the same number of PA's. To assess this, we'll need to look at other approaches.
Runs Per Game: Per-game rate of runs production.
This is a pure rate stat. And as we discussed in the piece on run estimation, because absolute runs don't completely account for the impact of producing outs on a team (fewer opportunities for future batters), the most correct way to convert absolute runs to a rate is to use runs per out. In this case, we're converting it to runs per 26.25 outs, because teams will average 26.25 outs per 9-inning game on offense (27 in all away games, 27 in losing home games, 24 in winning home games):
RpG = Runs/Outs*26.25
Where outs is AB-H+SF+0.92*SH.
Because this is purely a rate statistic, it completely ignores playing time. Brandon Phillips' ranking takes a fairly big hit here, and we see just how irrelevant Juan Castro and Chad Moeller were as hitters. Those are insights we don't get from absolute runs.
But at the same time, we see Joey Votto ranked among the best hitters the Reds had. It's true that his per-out and per-PA rates were excellent, but he also didn't even get 100 PA's, so we have to figure that the uncertainty around those estimates is pretty darn high. Votto's performance might portend great things to come, but in terms of evaluating historical value to the Reds team, I prefer the next two approaches.
Runs Above Average: players get credit or are penalized for their performance relative to their competition
To calculate runs above average, you convert a player's total runs produced to runs per game, subtract from that total the league average rate of runs per game, and then convert the resulting number back to season totals for that player. Or:
RAA = (RpG - LgAvgRpG)/26.25*Outs
Note, for National League players in particular, I think it's important to use league average runs per game for position players, rather than overall league average runs per game. Otherwise, you're including pitchers among the hitters that you're comparing your position players to, which seems inappropriate to me--we're not really interested in assessing how well our players hit relative to pitchers, are we? We want to know how well they hit vs. other position players. The difference is substantial (~0.3 r/g), so it's worth doing. In 2007, NL position players averaged 5.1 r/g according to my numbers.
Now, from the perspective of this method, players are valuable in as much as they help you beat other teams. After all, if the point is to win more games than you lose, you want players that help you accomplish this task. Therefore, you should compare each player to league average production. If a player produces above league average, he's helping you win. If he's below league average, he isn't helping you win--at least on offense.
There are a lot of advantages to using this baseline. First and foremost, it's very straightforward. If a player is above or below average, you immediately know something about how their performance was tied to your team's success. For this reason, I think it's probably one of the better statistics by which one can evaluate MVP's or other top hitting awards (Dunn, for example, was clearly the Reds' best hitter in 2007). This approach also provides some balance between valuation of playing time and performance: Phillips is ranked below guys like Hatteberg, Hamilton, and Griffey, but above Cantu and Votto (though ranking him below Keppinger is a bit problematic for me given how little Keppinger played).
Still, I'm not a huge fan of this baseline for most purposes. Two reasons. The first is largely aesthetic, and has to do with how it ranks players. For example, looking at their RAA, you see that Jeff Conine and Javier Valentin are rated as below average--which ranks them "below" the value of someone like Jason Ellison, who got such a paltry number of plate appearances and that he didn't have time to contribute much of anything, positive or negative, to the team.
My second issue is the fact that an average player is given a value of "zero," which seems to ignore how hard it can be to find "average" or even slightly below average players. The Reds had a pretty good offensive team this season, and as a result, most of their position players were ranked as above average. But Alex Gonzalez, who had a respectable (and career-best!) 0.793 OPS this year in 430 PA's, was ranked as just barely below average. And yet, that sort of player doesn't just grow on trees. Average players have a lot of value--they may not be helping the team win, but they're helping the team avoid losing, and thus avoid negating the production of the above-average players.
I realize that advocates of comparison to average systems argue that the claim that average players are undervalued by this approach is a misinterpretation of the data. And that's fair enough. But even if most of us properly interpret the data, we can't be guaranteed that all folks reading our work--especially those who are less experienced in thinking about these numbers--won't misinterpret them. That's why I prefer the last approach...
Runs Above Replacement: Players get credit for production above that which you would expect from an AAA scrub.
Calculations of RAR is similar to RAA, except that here you subtract a certain percentage of league average, rather than straight-up league average, from a player's runs per game. I recommend using 73% of position player league average (more on that below). Here's my equation:
RAR = (RpG - 0.73*LgAvgRpG)/26.25*Outs
The idea behind this approach is that even an AAA scrub--often dubbed "freely available talent," because even if a GM doesn't have an appropriate player in their system, they should be able to acquire one for next to nothing--is going to do a fair number of "Good Things" on the field if you play them for a full season. But their rate of Good Things will be much lower than what you'd expect from most MLB players, be they bench players or starters. Therefore, this approach sets a minimum level of production against which all players are judged, and players gain value based on how much better they perform above this minimum.
What I like about this approach is that it recognizes something about the importance of average performers, and more closely reflects the decisions a GM has to make. If a player is below-average, that might be a spot on the roster that a GM may try to improve. But as long as he's producing above the minimum level, he's producing at a level that provides real value for his team (i.e. better than talent could be acquired for free). Such a player may not be helping the team win, but he's helping the team avoid losing to a degree greater than what you could expect from any Joe Schmo pulled off the waiver wire or from a AAA team.
From a practical standpoint, I really like how this approach ranks the '07 Reds. Phillips is given a high ranking, which recognizes how valuable it is to have a guy who can produce at an above-average clip across 700 plate appearances. And yet, it also recognizes that excellence in performance is important, as evidenced by Joey Votto's (0.907 OPS in 89 PA's) ranking above guys like Conine, Valentin, Ross, and Freel.
The latter two guys, Ross (0.670 OPS) and Freel (0.655), were identified as hitting a touch below replacement level in the NL. That's not a good thing, and it reflects how bad their performances were--you might not gain anything by sticking any old AAA veteran in for those PA's, but you wouldn't lose anything either (remember, we're talking strictly about offense). Replacement level also recognizes, perhaps more so than any of the other approaches, the absolutely disastrous seasons by Juan Castro (0.446 OPS) and Chad Moeller (0.417) at the plate (~150 PA's between the two of them!!).
There are some problems, however, with using a replacement-level approach. The most significant is that there's not really a great consensus on where, quantitatively, we should draw the line of minimum performance. Brandon Heipp discussed a number of the attempts to come up with some kind of objective measure of replacement level in his article on baselines. I myself recently completed a study that tries to look at this, and I found that there wasn't a perfectly clear number, but it probably is somewhere between 70 and 80% of league average. At this point, I recommend using 73%. There is good theoretical justification for that number, and it falls within the empirical bounds for my study... and, compared to the higher alternatives (Keith Woolner uses 80% in VORP for most positions), it runs less risk of dismissing production that may be genuinely hard to replace.
The other problem with replacement level has to do with its typical justification. One traditional argument for replacement level is that it's the production above that of whoever would replace a player if they were no longer present on a team. As Brandon Heipp points out, however, this isn't really what we're assessing when we employ a 73% or 80% baseline: if a starter goes down due to injury, he will typically be replaced in the lineup by a bench player who will hit substantially better than replacement level. If we really were trying to identify value over replacement, we should probably try to use a method like the chaining approach Heipp describes.
But in actuality, I don't think that's really what we're after. I think we're trying to compare players to a minimum level, below which a player won't be able to retain even a fringe bench job on a major league roster. Anything better than that is worth noting. Anything less than that means you don't deserve to be in the majors...unless, of course, your defense makes up for it (more on that in a future piece).
In sum, I find runs produced over replacement, as described here, to be a very useful number. It gives a low baseline above which we can reasonably expect all players to play, it recognizes the value of average ballplayers, and player rankings based on this baseline make intuitive sense. Therefore, this is the number that I prefer to use when evaluating a player's offensive value to his team. As we argued in the first piece, however, offense is just half of a player's value...we also need to evaluate his defense. And there's also the question of whether offensive production by a player at one position is equally valuable as production at another position. More on that next time...
Update (11/7/07): I updated the above table to use slightly different linear weights, this time factoring in GDP's.
Coming up next...Positional adjustments on offensive numbers, and why we shouldn't do it.