Friday, February 25, 2005
What do they show? The top graph shows a consistent, high-level performer who can be expected to deliver the goods at about the same rate as the last few years. Maybe a little higher or a little lower than 2004 (.292/.352/.471) but we can surmise that this guy will be there contributing with an .820+ OPS.
The player in the middle graph has not achieved the higher, absolute levels of the other two but shows consistent improvement from year to year. This player probably has not yet peaked in his batting skills. We can look forward to a 2005 season, and a few more after that, that will most likely be an improvement over his 2004 season (.287/.327/.401) and more like his 2003 or 2002 season (.305/.364/.439).
The player in the bottom graph displayed awesome skills earlier in his career. A level of hitting talent the other two players will probably never even touch. But, alas. Fate has been unkind. A fast ball on the wrist changed everything. Even in a declining phase, the slope of the trendline is ominous, this player started from such a prodigous talent level that 2005 will produce respectable performance. Beyond that is an unknown risk.
The players are Derek Jeter, Edgar Renteria, and Nomar Garciaparra. But you probably already knew that.
Friday, January 28, 2005
PECOTA Put Me In A Quandry
My Winter project is to build a spreadsheet for each AL team that shows the runs created and runs allowed contribution for each player. I downloaded the Lahman database and went to work converting batting statistics and pitching statistics into Base Runs. I created what seems like a pretty effective method for converting fielding stats into runs allowed. Using simple exponential smoothing methods I came up with a projection for every player - and voila, it all seemd to work. The Red Sox and the Yankees came in at about 100 wins, the Twins around 90 wins, and the Angels a little higher than that. Team totals for runs scored and runs allowed were in the range of expectation. And then PECOTA came out.
I substituted all my batting and pitching numbers with PECOTA numbers with these results. The Red Sox went down 80 runs scored from '04 and 90 runs from '03. Runs allowed went through the roof when I included the fielding component, and wins dropped to 90 or less. Same results with the other teams for which I substituted numbers. Now, I either have to believe that Nate Silver has tons of prescience or that he screwed the pooch with this year's PECOTA projections.
Saturday, January 22, 2005
Bad Moon Rising For Trot?
The Age 27 Paradigm
The last data point is a 2005 projection by a neural net. I ran a separate projection for each offensive counting statistic. It caught my attention when all the independent projections fit together to make a reasonable Trot-like season with low PAs. It's the first year I ever tried using neural nets for baseball forecasting so who knows. The same neural net also forecast that Randy Johnson would pitch only 143 innings in '05 so it can't be all bad.
Tuesday, January 18, 2005
Johnson v. Schilling
Randy Johnson will marquee the Opener for the Yankees but much to the disappointment of television execs and baseball fans everywhere Curt Schilling likely will not be available for service until some weeks later. With 20 of these sumo matches on the schedule we'll only have to delay our enthusiasm.
How do Randy Johnson and Curt Schilling match-up over the last four years? You may not be familiar with BsR and FIP. If not, please scroll down to the the Pedro Martinez post for an explanation and links.
|C. S chilling||2002||259.3||.612||36.33||-0.602|
I'd give Randy Johnson the statistical edge, he has two seasons in which he pitched like an elite closer for 240+ innings but, really? What's the difference between getting smashed by a truck that weighs 30,000# and one that weighs 35,000#?
Sunday, January 16, 2005
Although it sounds ambitious these projections are straightforward. The overall strategy is to interpret for each roster member, as appropriate for their position, an offensive production rate, a defensive production rate, and a pitching production rate; convert these rates into runs scored or runs allowed per player based on their estimated playing time, and then total the numbers and run them through the Pythagorean Formula to quantify the team's winning percentage. For each non-pitcher in the American League you want to end up with something like this to plug into a spreadsheet: J. Damon 613 14.5 .9. That's the player's name, estimated plate appearances, offensive rate per 100 PA (add to runs scored), and defensive rate per 100 PA (add to runs allowed).
As a practical matter estimating playing time is the most fallible variable. Closely behind, however, is interpreting the defensive rate. This is how I've approached it to try to get a useful number to plug into the spreadsheet. ESPN lists in their fielding stats for each player (among other things):
In narrative form, J. Damon played 1256.1 innings in CF, had 359 Total Chances, and converted those chances into 349 outs and 5 assists. That's a Fielding Percentage of 98.6%. Sounds complete, but his Zone Rating is 87.9%. What's that all about? An outfit called STATS, Inc. has people at every major league game to chart the location of every batted ball. The field is divided into zones and fielding responsibility for each zone is assigned to a particular fielding position. J. Damon's Zone Rating tells us that he made a play on 87.9% of the balls batted into his zone of responsibility. The ZR is a more accurate model of reality than TC because the STATS scorers are not making a subjective judgment about whether or not an average center fielder "should" have made a play. The official scorers subjectively determine Total Chances and if in their opinion the ball could not have been caught, it was never batted, and it doesn't exist as a counting statistic. (The objectivity of ZR may be overstated in that STATS, Inc. does have "no-man zones" where no fielder has a responsibility.)
We can use the ZR to determine that J. Damon actually had 49 more fielding chances than implied by the conventional TC statistic (359/.879) alone. If we do this little exercise for each fielder on the team we determine that the 2004 Red Sox did not make a play on 1,051 batted balls more than the 5,811 tallied by the official scorer. We have some grist for the mill.
The 2004 Red Sox allowed 768 runs of which 674 were earned (94 unearned Ouch!). There were 159 opponents' home runs. If we assign 85% of the responsibility for the unearned runs to the fielders, and assign 100% responsibility for the 159 home runs to the pitchers, we get 595 runs allowed which involved only the fielders. The official scorers tell us that each batted ball put in play by the fielders was worth .102 fielding runs (595/5811). If we apply this same rate to the 1,051 batted balls left out by the official scorers we determine that nobody has been assigned responsibility for 108 runs. Because we know the number of innings each fielder played in each position we can use their Zone Rating for that position to determine exactly how many of those 108 runs should get plugged into their fielding rate for each position played in 2004. The .102 value is high compared to the five year major league average of .089 and .089 may be a better run value if you wanted to do cross-team or cross-year comparisons.
One side affect of this method is that offensive and defensive ratings can be netted out to compare players at each position. For example, at 1B Millar nets out about 2 runs higher than Minky per 100 plate appearances.
|PLAYER||RS RATE||RA RATE|
At a press conference following the Garciaparra trade Theo Epstein said that the trade was done to improve defense. What does Fielding Rate say about the result? Nomar played 311 innings at SS with a ZR of .694. He made 6 errors on the official scorer's 139 chances. ZR tells us that the scorer gave Normar a pass on an additional 61 batted balls that went through the SS position that Nomar never got to. Cabrera played 491 innings with a ZR of .844 and made 8 errors on 233 official chances. ZR tells us that Cabrera didn't get to an additional 43 batted balls. After figuring in their respective playing time Cabrera gets a rating of 1.9 runs and Garciaparra 4.3 runs. Theo was correct. He accomplished his purpose. Cabrera more than halved the rate at which opponents' runs were being scored through the SS position.
UPDATE: Fielding Rate is very much a work in progress. I'm a much better shopper than designer for these kinds of things. After re-reading Tangotiger's "How Runs Are Really Created" the run value of an out in a 4.8 RPG environment looked like the trick, but after further experimentation it may be overkill. The major league average of .089 runs per fielded ball may be the most useful. It's interesting that .089 is amost exactly half the run value of an out.
Saturday, January 15, 2005
The Pedro Era
The table shows the five most dominating seasons by a starter over the last five years.
Wow! There appears little question about the identity of the best starting pitcher of the 21st Century as we reach the midpoint of the first decade. It wouldn't be too surprising to see that claim stand up well into the fifth or sixth decades either.
This table shows Pedro chronologically (age 29-33) for the last five years. The added metric is Base Runs (BsR) per 100 innings. You can use BsR to figure out how many runs a pitcher will add to the team's Run Allowed total. A rating of 50 BsR per 100 innings is considered major league quality.
There's a trend here that suggests that Minaya did the right thing for the Mets by acquiring a marquee pitcher who has more than a few spectacular outings and front page stories left in his arm. Pedro's 2004 BsR looks more "average" than normal but the skinny FIP also shows that Pedro can still dominate a line-up.
I think the Red Sox did the right thing also by not buying a very expensive (and depreciating) asset near the top of the market. Pedro Martinez cannot be replaced. Not now. Not ever. The Red Sox will use the $50 million they didn't spend on Pedro to acquire quality arms for those 1,450 summer innings that get you to the playoffs.
Thursday, January 13, 2005
It's all about the runs
You play a baseball game by putting men on base and moving them around until they cross home. If you do that more times than the other guys you win that game. If you do it more times than the other guys in your division in the 162 opportunities you get in a season, you're team gets into the playoffs and has a chance to win the World Series.
It's simple. It's intuitive. It took 100 years for somebody to describe how the dynamics of the game really work. In the early 1970s Bill James introduced his first Runs Created formula. Elegant in its simplicity, the formula finally connected the game-time events that get people on base and move them around to create runs.
RC = (HITS + WALKS)* TOTAL BASES / PLATE APPEARANCES
The 2004 Red Sox had 1,613 hits, 659 walks, and 2,702 total bases in about 6,514 plate appearances. If we plug those numbers into the RC formula these basic counting statistcs tell us that the Red Sox created about 942 runs in the 2004 season. The actual number of runs is 949. Pretty close for such a simple description of real world events.
Dan Agonistes has a series of very well written articles on the development of Runs Created formulas if you want to pursue this further. The important concept is that anybody with a pencil can fairly accurately model at least some of the underlying dynamics of baseball. Counting statistics like hits, walks, doubles, triples, and home runs are readily available for every person who ever played in the Major Leagues. With a little ingenuity the RC formula can also be applied to pitchers who are in a real sense the composite of all the batters they face. With a little faith that the future will somewhat resemble the past the RC formulas can be used to project the likely future performance of players and teams.
Bill James had one more rabbit to pull from his magical hat to increase our understanding of how the game works. This one is called the Win Expectancy Formula, or more commonly now, the Pythagorean because of its resemblance to that famous triangle thingie. The Pythagorean enables us to use the output of the RC formula to get a pretty good idea of how many wins we can expect from a team whose batters score X runs and whose pitchers/fielders allow Y runs over the course of a season.
WIN% = RUNS SCORED ^2 / (RUNS SCORED ^2 + RUNS ALLOWED ^2)
Using the 2004 Red Sox as an example. The Sox scored 949 runs and allowed 768 runs. Plugging these numbers into the formula we get a win expectancy of 60.4%. That comes out to 98 wins in a 162 game season. It so happens that the 2004 Red Sox won exactly 98 games. It doesn't always come out on the number, but again, the important concept is that we own some simple tools that allow us to model baseball reality with a fair degree of accuracy.
This article is background material for those folks not familiar with the insight you can get from the Runs Created Formula and the Pythagorean Formula. We cannot predict the future but we can get a good sense of how well equipped a team or a particular player is to compete by examining the sum of their parts that score or prevent runs.
Wednesday, January 12, 2005
The Red Sox and Yankee Infields (Fielding)
The following table shows Jeter's Fielding Rate for his entire Major League career. Rate is a Clay Davenport (Baseball Prospectus) invention as described by BP: A way to look at the fielder's rate of production, equal to 100 plus the number of runs above or below average this fielder is per 100 games. A player with a rate of 110 is 10 runs above average per 100 games, a player with an 87 is 13 runs below average per 100 games.
2004 is the first year that Jeter became a fielding asset and not a liability. There were 2 Yankee wins in Jeter's glove between his '03 season and '04 season if we accept the wisdom (and we should) that 10 runs scored or 10 runs prevented = 1 win.
If we just accept that A-Rod has magically made Jeter a better fielding SS who will not revert to the mean in 2005, then how do the Red Sox and Yankee infields compare defensively? We'll compare offense later.
The Yankees swapped Miguel Cairo/Enrique Wilson for Tony Womack/Felix Escalona at 2B and brought back Tino Martinez to spell Jason Giambi or replace him entirely at 1B. For our analysis we are assuming that Giambi will not be playing 1B in 2005.
We'll look at Yankee fielding first. The table is set-up so that a negative number is a good thing. The number beside the player's name is the number of runs saved (subtracted from the team's Runs Allowed Total) or the number of runs added to the team's Runs Allowed total over the course of the season. All the analysis is adjusted for projected playing time in 2005.
T. Martinez -9
T. Womack 7
D. Jeter -5
A. Rodriguez -11
The starting Yankee infield is expected to save 18 runs for the season, or close to two wins. When F. Escalona subs for Womack the fielding contribution jumps up to better than 25 runs saved as Womack's harmful numbers come out.
The Red Sox have added Edgar Renteria at SS. Theo is talking about trading either Kevin Millar or Doug Mientkiewicz before ST. We suspect that Millar will stay. That's not a bad thing. Red Sox Nation beats on Millar's fielding like a dirty rug but if we forget about his antics in RF, which we can do in good conscience with Jay Payton on the roster, and leave him at 1B, he deserves more love.
K. Millar -4
M. Bellhorn 7
E. Renteria 8
B. Mueller -6
Yikes! It appears that the Good Ship Red Sox may have a hole below the waterline that concedes two wins to the Empire before playing a game. Newly acquired Ramon Vazquez can ease the pain in late innings by replacing either Bellhorn or Renteria albeit at a high cost with the bat.
It's still weeks before ST and we'll look at this again, but Jeter's new glove, if he can find it again in Tampa, may be the baseball gods' gift to the Empire.