Friday, February 25, 2005

Three Shortstops

The graphs below are three active shortstops on three contending teams. The players have about the same amount of time in the bigs and the graphs are scaled the same so each graph is directly comparable with the others. The data is an annual batting rate. Contact Rate (AB-K / AB) multiplied by Isolated Power (Total Bases - Hits / AB). I added a constant to all the calculations so that the result looked more like a batting average, on base percentage or slugging percentage that we see all the time. Higher is better. The straight line is a "best-fit" of the raw data and the slope should show a trend that is likely to continue through at least the next data point, which in this case is the 2005 season.

What do they show? The top graph shows a consistent, high-level performer who can be expected to deliver the goods at about the same rate as the last few years. Maybe a little higher or a little lower than 2004 (.292/.352/.471) but we can surmise that this guy will be there contributing with an .820+ OPS.

The player in the middle graph has not achieved the higher, absolute levels of the other two but shows consistent improvement from year to year. This player probably has not yet peaked in his batting skills. We can look forward to a 2005 season, and a few more after that, that will most likely be an improvement over his 2004 season (.287/.327/.401) and more like his 2003 or 2002 season (.305/.364/.439).

The player in the bottom graph displayed awesome skills earlier in his career. A level of hitting talent the other two players will probably never even touch. But, alas. Fate has been unkind. A fast ball on the wrist changed everything. Even in a declining phase, the slope of the trendline is ominous, this player started from such a prodigous talent level that 2005 will produce respectable performance. Beyond that is an unknown risk.

The players are Derek Jeter, Edgar Renteria, and Nomar Garciaparra. But you probably already knew that.

Friday, January 28, 2005

PECOTA Put Me In A Quandry

I've been in worse places, but looking forward to the first run of PECOTA projections has resulted in disappointment. I'm not slamming the system. God bless anybody who would publish predictions about how a large number of human beings will act individually at a given task, much less charge money for it.

My Winter project is to build a spreadsheet for each AL team that shows the runs created and runs allowed contribution for each player. I downloaded the Lahman database and went to work converting batting statistics and pitching statistics into Base Runs. I created what seems like a pretty effective method for converting fielding stats into runs allowed. Using simple exponential smoothing methods I came up with a projection for every player - and voila, it all seemd to work. The Red Sox and the Yankees came in at about 100 wins, the Twins around 90 wins, and the Angels a little higher than that. Team totals for runs scored and runs allowed were in the range of expectation. And then PECOTA came out.

I substituted all my batting and pitching numbers with PECOTA numbers with these results. The Red Sox went down 80 runs scored from '04 and 90 runs from '03. Runs allowed went through the roof when I included the fielding component, and wins dropped to 90 or less. Same results with the other teams for which I substituted numbers. Now, I either have to believe that Nate Silver has tons of prescience or that he screwed the pooch with this year's PECOTA projections.

Saturday, January 22, 2005

Bad Moon Rising For Trot?

Trot Nixon has been the regular Red Sox right fielder since 1999. He's a quiet guy, a class act, and a heck of a baseball player. It's an old baseball paradigm, and one that's largely true, that position players peak statistically at age 27. The graph below is suggesting that there may be a Bad Moon rising on Trot Nixon's playing career. Due to a variety of nagging injuries his plate appearances have dropped off dramatically since age 27. Remarkably Trot's high level of performance has not followed suit. His offensive contribution measured by Runs Created Per 100 Plate Appearances has remained within spitting distance of the league leaders every year, injuries or no. Just because there's a cloud in the sky doesn't mean that it has to rain, and old baseball paradigms don't make the future. Here's hoping for a healthy 2005 for a guy I'm very happy to see wear the laundry.

The Age 27 Paradigm Posted by Hello

The last data point is a 2005 projection by a neural net. I ran a separate projection for each offensive counting statistic. It caught my attention when all the independent projections fit together to make a reasonable Trot-like season with low PAs. It's the first year I ever tried using neural nets for baseball forecasting so who knows. The same neural net also forecast that Randy Johnson would pitch only 143 innings in '05 so it can't be all bad.

Tuesday, January 18, 2005

Johnson v. Schilling

The summer long heavyweight bout between the Red Sox and the Yankees will begin on Major League Baseball's Opening Day when before the very first pitch of Dynasty Year III is thrown the 2004 World Champion Boston Red Sox will unfurl their trophy banner deep within the heart of the Empire's Death Star. There's some delicious irony there.

Randy Johnson will marquee the Opener for the Yankees but much to the disappointment of television execs and baseball fans everywhere Curt Schilling likely will not be available for service until some weeks later. With 20 of these sumo matches on the schedule we'll only have to delay our enthusiasm.

How do Randy Johnson and Curt Schilling match-up over the last four years? You may not be familiar with BsR and FIP. If not, please scroll down to the the Pedro Martinez post for an explanation and links.

R. Johnson2001249.7.58130.65-1.137
C. S chilling2002259.3.61236.33-0.602
R. Johnson2003114.78457.54 0.342
C. Schilling2003168.62437.42-0.423
R. Johnson2004245.7.55528.76-0.871
C. Schilling2004226.7.65740.72-0.009

I'd give Randy Johnson the statistical edge, he has two seasons in which he pitched like an elite closer for 240+ innings but, really? What's the difference between getting smashed by a truck that weighs 30,000# and one that weighs 35,000#?

Sunday, January 16, 2005

Fielding Rate

There's three months before Opening Day. The Red Sox and Yankee rosters for 2005 are fairly well set, so which team has the best chance (on paper) to win the Division in '05? I'm not forgetting that there are three other teams in the Division, but unless Baltimore does something spectacular with its pitching between now and Opening Day there are only two teams in the race.

Although it sounds ambitious these projections are straightforward. The overall strategy is to interpret for each roster member, as appropriate for their position, an offensive production rate, a defensive production rate, and a pitching production rate; convert these rates into runs scored or runs allowed per player based on their estimated playing time, and then total the numbers and run them through the Pythagorean Formula to quantify the team's winning percentage. For each non-pitcher in the American League you want to end up with something like this to plug into a spreadsheet: J. Damon    613  14.5  .9. That's the player's name, estimated plate appearances, offensive rate per 100 PA (add to runs scored), and defensive rate per 100 PA (add to runs allowed).

As a practical matter estimating playing time is the most fallible variable. Closely behind, however, is interpreting the defensive rate. This is how I've approached it to try to get a useful number to plug into the spreadsheet. ESPN lists in their fielding stats for each player (among other things):

CF1256.13593495 .879

In narrative form, J. Damon played 1256.1 innings in CF, had 359 Total Chances, and converted those chances into 349 outs and 5 assists. That's a Fielding Percentage of 98.6%. Sounds complete, but his Zone Rating is 87.9%. What's that all about? An outfit called STATS, Inc. has people at every major league game to chart the location of every batted ball. The field is divided into zones and fielding responsibility for each zone is assigned to a particular fielding position. J. Damon's Zone Rating tells us that he made a play on 87.9% of the balls batted into his zone of responsibility. The ZR is a more accurate model of reality than TC because the STATS scorers are not making a subjective judgment about whether or not an average center fielder "should" have made a play. The official scorers subjectively determine Total Chances and if in their opinion the ball could not have been caught, it was never batted, and it doesn't exist as a counting statistic. (The objectivity of ZR may be overstated in that STATS, Inc. does have "no-man zones" where no fielder has a responsibility.)

We can use the ZR to determine that J. Damon actually had 49 more fielding chances than implied by the conventional TC statistic (359/.879) alone. If we do this little exercise for each fielder on the team we determine that the 2004 Red Sox did not make a play on 1,051 batted balls more than the 5,811 tallied by the official scorer. We have some grist for the mill.

The 2004 Red Sox allowed 768 runs of which 674 were earned (94 unearned Ouch!). There were 159 opponents' home runs. If we assign 85% of the responsibility for the unearned runs to the fielders, and assign 100% responsibility for the 159 home runs to the pitchers, we get 595 runs allowed which involved only the fielders. The official scorers tell us that each batted ball put in play by the fielders was worth .102 fielding runs (595/5811). If we apply this same rate to the 1,051 batted balls left out by the official scorers we determine that nobody has been assigned responsibility for 108 runs. Because we know the number of innings each fielder played in each position we can use their Zone Rating for that position to determine exactly how many of those 108 runs should get plugged into their fielding rate for each position played in 2004. The .102 value is high compared to the five year major league average of .089 and .089 may be a better run value if you wanted to do cross-team or cross-year comparisons.

One side affect of this method is that offensive and defensive ratings can be netted out to compare players at each position. For example, at 1B Millar nets out about 2 runs higher than Minky per 100 plate appearances.

K. Millar15.26.4
D. Mintkiewicz11.64.4

At a press conference following the Garciaparra trade Theo Epstein said that the trade was done to improve defense. What does Fielding Rate say about the result? Nomar played 311 innings at SS with a ZR of .694. He made 6 errors on the official scorer's 139 chances. ZR tells us that the scorer gave Normar a pass on an additional 61 batted balls that went through the SS position that Nomar never got to. Cabrera played 491 innings with a ZR of .844 and made 8 errors on 233 official chances. ZR tells us that Cabrera didn't get to an additional 43 batted balls. After figuring in their respective playing time Cabrera gets a rating of 1.9 runs and Garciaparra 4.3 runs. Theo was correct. He accomplished his purpose. Cabrera more than halved the rate at which opponents' runs were being scored through the SS position.

UPDATE: Fielding Rate is very much a work in progress. I'm a much better shopper than designer for these kinds of things. After re-reading Tangotiger's "How Runs Are Really Created" the run value of an out in a 4.8 RPG environment looked like the trick, but after further experimentation it may be overkill. The major league average of .089 runs per fielded ball may be the most useful. It's interesting that .089 is amost exactly half the run value of an out.

Saturday, January 15, 2005

The Pedro Era

Pedro's 1999-2000 years are probably as good as a human being can throw a baseball. One measure of raw pitching talent is a metric called FIP (Fielding Independent Pitching). Tangotiger created FIP to gain some insight into how well a pitcher executes those things over which he has the most control, like stikeouts, walks, and homeruns. A more catchy name may be Ability to Dominate because that's really what this metric shows. Pitchers with sub-zero FIP take the ball out of play, keep baserunners to a lonely few, and send people home early. For perspective, a 1.34 FIP will keep you playing in the show and earning a cool million per year.

The table shows the five most dominating seasons by a starter over the last five years.

Pedro Martinez2001-1.594
Pedro Martinez2000 -1.157
Randy Johnson2001-1.137
Pedro Martinez2003-0.964
Pedro Martinez2002-0.948

Wow! There appears little question about the identity of the best starting pitcher of the 21st Century as we reach the midpoint of the first decade. It wouldn't be too surprising to see that claim stand up well into the fifth or sixth decades either.

This table shows Pedro chronologically (age 29-33) for the last five years. The added metric is Base Runs (BsR) per 100 innings. You can use BsR to figure out how many runs a pitcher will add to the team's Run Allowed total. A rating of 50 BsR per 100 innings is considered major league quality.

2001117 25.97-1.594
200421744.41 0.309

There's a trend here that suggests that Minaya did the right thing for the Mets by acquiring a marquee pitcher who has more than a few spectacular outings and front page stories left in his arm. Pedro's 2004 BsR looks more "average" than normal but the skinny FIP also shows that Pedro can still dominate a line-up.

I think the Red Sox did the right thing also by not buying a very expensive (and depreciating) asset near the top of the market. Pedro Martinez cannot be replaced. Not now. Not ever. The Red Sox will use the $50 million they didn't spend on Pedro to acquire quality arms for those 1,450 summer innings that get you to the playoffs.

Thursday, January 13, 2005

It's all about the runs

You can enjoy a baseball game, a season, the tradition, the sport without ever even looking at a newspaper box score. The numbers are there. You don't need them to enjoy the aesthetics of the game, but if you want to understand the hidden order within the chaos of the hundreds of thousands of individual events that comprise a professional baseball season, it's all about the runs.

You play a baseball game by putting men on base and moving them around until they cross home. If you do that more times than the other guys you win that game. If you do it more times than the other guys in your division in the 162 opportunities you get in a season, you're team gets into the playoffs and has a chance to win the World Series.

It's simple. It's intuitive. It took 100 years for somebody to describe how the dynamics of the game really work. In the early 1970s Bill James introduced his first Runs Created formula. Elegant in its simplicity, the formula finally connected the game-time events that get people on base and move them around to create runs.


The 2004 Red Sox had 1,613 hits, 659 walks, and 2,702 total bases in about 6,514 plate appearances. If we plug those numbers into the RC formula these basic counting statistcs tell us that the Red Sox created about 942 runs in the 2004 season. The actual number of runs is 949. Pretty close for such a simple description of real world events.

Dan Agonistes has a series of very well written articles on the development of Runs Created formulas if you want to pursue this further. The important concept is that anybody with a pencil can fairly accurately model at least some of the underlying dynamics of baseball. Counting statistics like hits, walks, doubles, triples, and home runs are readily available for every person who ever played in the Major Leagues. With a little ingenuity the RC formula can also be applied to pitchers who are in a real sense the composite of all the batters they face. With a little faith that the future will somewhat resemble the past the RC formulas can be used to project the likely future performance of players and teams.

Bill James had one more rabbit to pull from his magical hat to increase our understanding of how the game works. This one is called the Win Expectancy Formula, or more commonly now, the Pythagorean because of its resemblance to that famous triangle thingie. The Pythagorean enables us to use the output of the RC formula to get a pretty good idea of how many wins we can expect from a team whose batters score X runs and whose pitchers/fielders allow Y runs over the course of a season.


Using the 2004 Red Sox as an example. The Sox scored 949 runs and allowed 768 runs. Plugging these numbers into the formula we get a win expectancy of 60.4%. That comes out to 98 wins in a 162 game season. It so happens that the 2004 Red Sox won exactly 98 games. It doesn't always come out on the number, but again, the important concept is that we own some simple tools that allow us to model baseball reality with a fair degree of accuracy.

This article is background material for those folks not familiar with the insight you can get from the Runs Created Formula and the Pythagorean Formula. We cannot predict the future but we can get a good sense of how well equipped a team or a particular player is to compete by examining the sum of their parts that score or prevent runs.

Wednesday, January 12, 2005

The Red Sox and Yankee Infields (Fielding)

Something interesting happened in '04. (Something other than the Red Sox winning the World Series in Dynasty Year Two, of course). Derek Jeter, the poster boy of SABR defensive ineptitude, learned how to field a groundball in his tenth year in the show. The other possibility is that statistical defensive analysis is not now nor has it ever been worth a damn but we won't go there, not today anyway.

The following table shows Jeter's Fielding Rate for his entire Major League career. Rate is a Clay Davenport (Baseball Prospectus) invention as described by BP: A way to look at the fielder's rate of production, equal to 100 plus the number of runs above or below average this fielder is per 100 games. A player with a rate of 110 is 10 runs above average per 100 games, a player with an 87 is 13 runs below average per 100 games.

1995 85
1996 91
1997 91
1998 93
1999 89
2000 86
2001 88
2002 89
2003 83
2004 103

2004 is the first year that Jeter became a fielding asset and not a liability. There were 2 Yankee wins in Jeter's glove between his '03 season and '04 season if we accept the wisdom (and we should) that 10 runs scored or 10 runs prevented = 1 win.

If we just accept that A-Rod has magically made Jeter a better fielding SS who will not revert to the mean in 2005, then how do the Red Sox and Yankee infields compare defensively? We'll compare offense later.

The Yankees swapped Miguel Cairo/Enrique Wilson for Tony Womack/Felix Escalona at 2B and brought back Tino Martinez to spell Jason Giambi or replace him entirely at 1B. For our analysis we are assuming that Giambi will not be playing 1B in 2005.

We'll look at Yankee fielding first. The table is set-up so that a negative number is a good thing. The number beside the player's name is the number of runs saved (subtracted from the team's Runs Allowed Total) or the number of runs added to the team's Runs Allowed total over the course of the season. All the analysis is adjusted for projected playing time in 2005.

T. Martinez -9
T. Womack 7
D. Jeter -5
A. Rodriguez -11

The starting Yankee infield is expected to save 18 runs for the season, or close to two wins. When F. Escalona subs for Womack the fielding contribution jumps up to better than 25 runs saved as Womack's harmful numbers come out.

The Red Sox have added Edgar Renteria at SS. Theo is talking about trading either Kevin Millar or Doug Mientkiewicz before ST. We suspect that Millar will stay. That's not a bad thing. Red Sox Nation beats on Millar's fielding like a dirty rug but if we forget about his antics in RF, which we can do in good conscience with Jay Payton on the roster, and leave him at 1B, he deserves more love.

K. Millar    -4
M. Bellhorn    7
E. Renteria    8
B. Mueller    -6

Yikes! It appears that the Good Ship Red Sox may have a hole below the waterline that concedes two wins to the Empire before playing a game. Newly acquired Ramon Vazquez can ease the pain in late innings by replacing either Bellhorn or Renteria albeit at a high cost with the bat.

It's still weeks before ST and we'll look at this again, but Jeter's new glove, if he can find it again in Tampa, may be the baseball gods' gift to the Empire.

This page is powered by Blogger. Isn't yours?