Click to open >> Pirates 2006 one-run games.
This is the plot that motivated the BBSP.
One-run games (games decided by one run) appear right above the diagonal if the Pirates lost these games, and right below the diagonal if the Pirates were on the winning side. You could count the blue x's and black dots, but you can get the general idea just by looking at the plot.
Many blue x's right above the diagonal indicating a terrible record in the one-run games in the first half. Some black dots right below the diagonal indicating the subsequent improvement in such games in the second half.
In any case, it's tough being a Pirate fan. 2-13 since All-Star Break. Ouch.
Monday, July 30, 2007
Pittsburgh Pirates 2006
Posted by beetama74 0 comments
Labels: example
Friday, July 27, 2007
Known Issues
Here are the known issues that our team is working on right now:
Altoona Mountain City (1884) doesn't plot. I think that the issue is they didn't play any game on Sunday.
Many teams in 50's or before: Pitcher with Decision split acts funny. So does Opponent Pitcher with Decision split. Example: Pittsburgh Pirates 1950 and Pitcher with Decision.
I found out what the problem is. Retrosheet doesn't have data for "losing pitcher" for the old days (pre 1950?). So when the games are split based on "Pitcher with Decision", you'll notice that nobody has any losses. That's because nobody is in the "losing pitcher" column. At least the number of wins seems to be counted correctly.
"Opponent Pitcher with Decison" is not missing only if the (opponent) pitcher is the winning pitcher, so again with this split, only the wins, for the opponent pitcher (thus losses for the Pirates), appear.
We need a rational way to address this issue.
Ties don't appear in the record. Well, it's because I assumed that there is no tie game in MLB when I wrote the original code. I should have taken a history lesson. Downloading a pdf file seems to be a problem with Firefox. I thought we have fixed this... Fixed!
If you find anything else, please let us know via comments.
Posted by beetama74 2 comments
Labels: update
Tuesday, July 24, 2007
Up and Running
The Bivariate Baseball Score Plot project is now open to public.
Its official birthday is 7/24/07.
Go to http://data.vanderbilt.edu/rapache/bbplot/ and have fun!
If you have any comments, please leave it on this blog.
Bug reports are also welcome. We're still working on it.
Posted by beetama74 1 comments
Labels: update
Thursday, July 19, 2007
How to Read a Bivariate Baseball Score Plot.
Basic Ideas
The bivariate baseball score plots present summary information for Major League Baseball teams’ game scores.
Each game is represented as one mark in the joint score distribution grid and one mark in each marginal. Splits based on a variety of game parameters (starting pitcher, day/night, home/away, etc.) are available; different values of the split parameter are differentiated by color and shape. Games are shown collected into little groups (in this case, groups are size 3) so as to maintain a rational aspect ratio for the overall plot.
Marginal Distributions
The marginal score distributions are shown on the top for the selected team and along the left side for that team’s opponents; also shown via small tick marks on the runs scale are the overall mean runs per game (rpg), along with mean rpg for games meeting, and those not meeting, the split criterion.
Arbitrarily, games meeting the split criterion are placed at the bottom in each stack. Reference lines are drawn to improve one’s ability to quickly count games in a column or row. If there are games with scores in excess of the arbitrary maximum (here, 15), a plus sign is added to denote the presence of such games.
The marginals are oriented along the top and left sides so as to facilitate comparison between the marginals. A simple twist of the head allows visual comparison of the two marginals without needing to reverse the positive direction mentally, as would be necessary if the marginals were shown protruding away from the center and located in the traditional bottom and left side positions.
Joint Distribution
The joint distribution is shown as collected marks in small squares. Victories for the selected team will be below the diagonal, losses above. One-run games will be just above and below the diagonal. Again, games are grouped for ease in counting; the squares are shaded in relation to the number of games they contain. Thus, more ink means more data. This presents a layered presentation for the data; the overall distribution is visible from afar, while atomic-level datum details are available upon closer inspection. “Reward the viewer for mental and visual investment in the graphic.”
Example: Astros, Roger Clemens in 2005
Open a png file.
Download a pdf version (31KB).
This example shows 163 games for the 2006 Houston Astros, with games started by Roger Clemens highlighted. The Astros finished the year 16 games over .500 but were 2 games under .500 in games which Clemens started.
The Astros' marginal score distribution (at the top of the plot) shows typical numbers from a good team: an overall average of over 4 rpg. Clemens, however, appears to have received less run support, as the Astros’ average offensive output in games he started is less than 3.5 rpg. Closer inspection reveals that Clemens was the unfortunate recipient of 9 of Houston’s 17 shutouts in 2006. While the Astros did score at least 7 runs for 5 of Clemens’ starts, the overall offensive support for Clemens actually was, well, offensive.
The Astros’ opponents’ marginal distribution (on the left) shows how teams fare against teams that beat them: their average rpg is just over 3.5 rpg compared with nearly 4.5 rpg for the Astros. Where the Astros were held to 1 run 27 times, their opponents were held to 1 or fewer on 42 occasions. Note that Clemens started 2 games that were shutouts and started 11 games where the opponents were held to fewer than 2 runs. He also started a game where the opponents scored 9 runs.
The joint distributions reveals details of Clemens’ abysmal run support. The bottom-left corner of the distribution shows five games which Clemens started in which the Astros lost 1-0, a pitcher’s nightmare. So, of the 11 games that Clemens started and the opponents were held to one run, 5 of those games failed to produce a single Houston run. In fact, Clemens was the only Astros pitcher to start a game in which the team lost 1-0.
The joint distribution reveals a rather ordinary overall record of 25-21 in one-run games, a measure often heralded as a mark of good teams.
The keen eye will note a single game on the diagonal, a 2-2 tie. Prior to 2007, such games that were tied but suspended were kept on the books for purposes of individual statistics, but were replayed at the next available opportunity.
Disclaimer and Software Information
Data for the plots were obtained from retrosheet.org. Programming was done using the R environment for statistcal computing and graphics.
An interactive website is available for examining score distributions of any team in the retrosheet database from 1876-2006 at http://data.vanderbilt.edu/rapache/bbplot/ .
Posted by rafe donahue 2 comments
Labels: introduction
Tuesday, July 17, 2007
Who we are...
rafe donahue
Day job: Biostatistician
Contribution to this project: Statistical philosophy, adult supervision, and guy willing to wear the tie at the presentation
Favorite MLB team: Brewers
Favorite NFL team: Packers
beetama74
Day job: Biostatistician
Contribution to this project: Statistical reasoning, R programming, and guy who created the original version of the plot
Favorite MLB team: Pirates
Favorite NFL team: Steelers
Jeffrey
Day job: Computer programmer
Contribution to this project: R/Apache implementation and a non-baseball guy's perspective
Favorite MLB team: unknown
Favorite NFL team: Titans (presumably)
Cole
Day job: Computer programmer
Contribution to this project: R/Apache implementation and a baseball guy's perspective
Favorite MLB team: Pirates
Favorite NFL team: Titans
Posted by beetama74 0 comments
Labels: introduction
Saturday, July 14, 2007
Bivariate Baseball Score Plot
It all started at the All-Star break of the last season (2006). The Pirate fans everywhere noticed that the Pittsburgh Pirates have lost many, many one-run games. (Games decided by one run)
They were 27-54 (.333) at the end of the first half, and they were 8-23 (.258) in one-run games. Obviously, the winning percentage (.258) and win loss difference (-15) were worst in the Majors. Then I thought, "How can I show the Pirates' record to accentuate their terrible performance in those one-run games?"
After some discussions with my colleagues, I came to the conclusion that the best way was to show everything. Not summaries, but every single datum = game. After some more discussions with the colleagues, I created what would become the Bivariate Baseball Score Plots. (And the Team was formed.)
So at the heart of this project, there is a dedicated and irate Pirate fan in Nashville. Well, actually our team of four happens to have 2 Pirate fans. The other two are a Brewer fan and a guy who doesn't care much about baseball.
Let's go Bucs!
Posted by beetama74 1 comments
Labels: introduction