A critical element in the dissemination of scientific discovery is the preparation of a paper for publication. Strong rules and traditions govern the writing of science for a journal. The tone should be sober and restrained as if emotions and literary flourish do not exist. With “I” or “we” resisted if not banned, passive voice abounds and the text becomes littered by words and phrases “thus,” “furthermore,” and “in this regard.” The vocabulary is limited and adjectives, beyond “significant” and “robust,” are as rare as hen’s teeth. Of course, similes, even if clever and apt, are verboten in science writing. Seeing such a simile, the average editor would blast the author for such license, howling “Jargon!” or at least demanding evidence from a large, cross-sectional study of a chicken coop - IACUC approved, of course - that hen’s teeth are truly rare.

Despite glorious data described in the results section in today’s scientific papers, the discussions are usually a slog, overflowing with disclaimers, qualifications and stipulations that can rumble on for pages. Inherently bland and enervated (and enervating), such writing can effectively transform a joyous cry of “Eureka!” into a muffled mumble of “Ho-hum.” To illustrate how the style of science writing can temper both thinking and feeling, I wrote the following account of a hypothetical World Series contest as if it would appear in a prestigious publication like the New England Journal of Medicine. I think that this piece shows clearly what would happen if reporter would record, with the detachment, rationality and style demanded of a scientist, the drama and feverish excitement of one of great happenings of American sports.

I wrote this piece a time of optimism before the 2012 playoffs and series actually occurred. Alas, the Bronx Bombers bombed and the Giants triumphed over the toothless Tigers. As a baseball fan from New York, all I can say is “Wait till next year.” As a scientist wanting to flash a little a personality, liven up my craft and tell the world how snazzy and nifty my data really are, I can only echo the words of the great sage and philosopher Yogi Berra. “Take it with a grin of salt.”


A series of games was held between the New York Yankees and the San Francisco Giants to determine the best team in baseball. This series involved a superiority design, with the number of games played determined according to protocol. The primary outcome measure was 4 games won with a committee of umpires adjudicating events during the game using a set of pre-determined rules. As results reported by ESPN showed, the Yankees had the better series record, winning 4 games to the Giants 3. These results are consistent with the hypothesis that the Yankees are the better team.

While the design for the current series is well-established, important aspects bear discussion. Thus, the primary outcome measure is number of games won as opposed to number of runs scored. A priori, number of runs score would appear to be a better measure of a team’s capability since it can be established over time and would be less subject to variations at the end of low scoring games, especially in a 7 game series. Indeed, a post-hoc analysis of the current series indicate that the Giants had outscored the Yankees significantly, with a 1-0 victory in game 7 giving the Yankees the title despite being outscored by 35 runs over the preceding 6 games. Furthermore, an analysis of area under the curve (AUC) fully supports the advantage of the Giants in time-averaged runs scored.

Another shortcoming in the current design is the absence of a power calculation. While 7 games have been the format for the series for over 100 years, this issue has not been subjected to rigorous trials.

Further shortcomings of the current approach should be considered. Thus, many decisions throughout the game are made by umpires, especially determination of balls and strikes which can crucially impact of outcome. While umpires are considered to be impartial, the design of the series prevents blinding. In the future, new tools for these determining events in the game should be considered as has been done in other games such as tennis. However, whereas in tennis, lines clearly demarcate court boundaries, the strike zone is highly subjective.

In this regard, the outcome is determined by a putatively objective measure (i.e., number of games won), there are no fan reported measures. Baseball is a sporting event, but it also constitutes entertainment. As such, an assessment of fans seems appropriate. Reference to college football supports this idea. For many years, the national championship was determined by polls of the coaches or sportswriters. While occasionally voting produced discrepancies, the system appeared to be reliable. The more recent BCS series format, while nominally more objective, is nevertheless limited by lack of adequate number of head to head comparisons among teams to support a statistical methodology open to question.

Finally, the conduct of the World Series does not involve an economic analysis. Clearly, a survival analysis of fans is not possible but consideration could be given to a quality of life assessment based on fan reported outcomes. Unfortunately, as noted, the current format does not include assessment of this variable, although, in any such determination, there are many unknowns. Thus, it should be questioned whether fan assessment should be based on time average measures or only the final result. For example, at the time San Francisco led the series, 3 games to 1, it could be argued that fan enjoyment was high and could have balanced their ultimate disappointment. On the other hand, assessment of fan opinion at the end of the series would be consistent with current outcome measures. In this regard, any such economic analysis must include costs associated with any post-game celebration although metrics are lacking in this realm.

Thus, while the Yankees did have more victories than Giants, we do not feel that the data available supports their superiority in a statistically significant fashion. Future series will hopefully utilize more robust outcome measures, including well-validated markers, and determine more clearly the best team in baseball.