Does Larkin Belong in the Hall of Fame? Revisited
I can't remember if I linked to this or not, but even if I have it's worth linking again: Rally has posted season-by-season WAR estimates for all players in the Retrosheet era. He also has a top-300 ranking, so we can look at the best of the past 50+ years using these numbers.
Rally's data include offense, defense (including turning double plays, etc), baserunning, and era-specific position adjustments. This is similar to what I tried to do in my piece on Larkin, but better because of the baserunning & especially the era-specific position adjustments. Here is how the shortstops I included in my Larkin study pan out in Rally's WAR data, plus a number of others who came up in discussions following my Larkin piece:
Some players got a big boost in their rating, like Ozzie and Aparicio, once you include baserunning (I only included SB's & CS's) and double play turning. But as you can see, the results are more or less the same as far as Larkin is concerned: probably the 4th best overall, and 2nd-best pure shortstop in the Retrosheet ERA, at least based on total contributions to their ballclubs relative to their league.
Career-level WAR accumulation isn't the be-all, end-all of hall of fame voting, as peak performance is also important. But in Larkin's case, career WAR is crucial. No one disputes that he was brilliant player when healthy. The knock on him is that he didn't play enough due to all of his injuries. These data clearly indicate that his total contribution, including playing time, was among the best in baseball history at his position.
Rally has Larkin as the 30th-best position player of the Retrosheet era. I know he almost certainly will not be a first-ballot Hall of Famer, but he probably should be.
Why I'm trying to stop using OPS
Colin followed up his study posted last week on run estimators with an improved method. This time, instead of looking at half-inning or even game-level combinations of team offense, he instead focused on identifying the average value of particular offensive events to games. His methodology was to take matched games--games that had the same numbers of major counting events, but that differed in how many of one specific event they contained.
For example, he might have a game with 5 singles, 3 doubles, 1 homer, and 3 walks, and he'd compare that to a game with 5 singles, 3 doubles, 1 homer, and 4 walks. Finding the average difference in runs scored between pairs of games like this would tell you the average value of a walk in runs. He then compared those actual differences in runs scored to the expect difference in runs scored according to a variety of run estimation mechanisms.
The results? Linear weights-based methods did the best. This includes his "house" linear weights (which he kindly shares), as well as manipulations of linear weights like wOBA. A bit behind them were GPA (aka 1.7 OPS), Base Runs and BPro's EqR, followed a bit more distantly by Bill James' Runs Created. The worst of the bunch were the OPS-based methods, as well as the even-more-horrible Total Average (bases/outs).
This is strong evidence that we should more or less stop using OPS to evaluate hitters. It's unnecessary, given how easy wOBA is to calculate. Is it better than batting average? Sure, of course. But it misses badly enough and often enough that we should really move past it. It's a tough habit to break, but it's time to wOBA, folks.
CHONE is a really good projection system
Matt has a fairly exhaustive projection roundup here. He notes that each system seems to have its own strengths, but often also some weaknesses:
The nice thing about this is that we can use this information to give more or less weight to a given projection system when it differs from others in predicting a given player's performance based on the sort of player we're looking at. Or, we can do what I've essentially decided to do around here, which is to just use CHONE. :)
--CHONE was the best at projecting most things.
--PECOTA was very close behind but had some systematic biases, specifically for speedy players' BABIPs, which ZIPS struggled with as well.
--ZIPS is behind the other systems, except it does quite well with projecting the three true outcomes for players over 35.
--CHONE does better with older players in general, since its specialty is aging curves, but PECOTA does better at finding comparable players for younger players for whom less data is available (unless they fall into the speedster category).
--OLIVER clearly contends and even takes the lead at some things--especially at projecting hitters with lower homerun totals and other players significantly affected by park effects. However, OLIVER under-projects walks and strikeouts systematically and over-projects homeruns systematically, and could probably be improved by adjusting how those outcomes are computed.
It's worth noting that Matt's is just the latest projection roundup in which CHONE did particularly well. Whether it will continue to do so in the future is an open question, of course, but the data suggest that it's as good as they come.
The Dunning-Kruger effect
JC posted about this terrific psychological concept: that people incompetent in a particular discipline will massively overestimate their competency in that discipline. That's pretty much the definition of a baseball fan, isn't it? :)
I'm jesting, mostly. You certainly see arguments between baseball fans who really know their stuff and baseball fans who just think they know their stuff. And I tend to think that most of what you hear on talk radio (sports, or otherwise) involves people who fall into the latter category rather than the former category. And, of course, I tend to think that on at least some issues (some areas of biology, some areas of baseball research, etc), I fall into the reasonably competent category.
But the great part of this is that the Dunning-Kruger effect predicts that we'll have a very hard time being able to tell whether we're competent or not...because the more incompetent we are, the less we'll realize it! :)