E. coli's career in science has been stellar so far. E. coli lead a simple life as an inhabitant of our guts for thousands of years, until 20th century scientists discovered that the bacterium was easy to grow and manipulate in the lab. E. coli rose to scientific fame and became a laboratory superstar. As of today, scientists have published hundreds of thousands of papers on this little bug.

For a species on which so many knowledgeable women and men have written, it is surprising how much is still unclear about its evolutionary history. E. coli's general place in the tree of life has been well resolved, but when scientists take a closer look at the E. coli family, relationships blur and become uncertain. Part of the problem is that different strains of E. coli seem to be evolving into different directions. If true, this "could reflect the end of E. coli as a species", researchers wrote in BMC Evolutionary Biology last month.

Scientists have long known that E. coli is not a single entity. Some strains are benign inhabitants of our guts, some make us deadly ill and some only live in laboratories. Most E. coli belong to one of of five different groups: A, B1, B2, D and E. These groups correlate with some of the biological aspects of the bacteria that belong to it. E. coli from group B2 often cause diseases, for example (although groups A, B1 and D also have their share of pathogenic bugs).

The evolutionary relationships between these different groups are hazy however. In 2009 scientists published a tentative family tree of E. coli in PLoS Genetics, based on comparisons of almost 2000 genes of 20 different strains of E. coli. Despite the large scale of this research, the published tree far from the definite truth. In some cases only 11% of the analyzed genes supported a certain branch.

To get some grip on the slippery E. coli, Shana Leopold and her colleagues decided to focus on their 'backbone DNA' - long stretches of DNA with few differences between the different strains. The problem of comparing all the E. coli genes - like in the 2009 paper - is that you also include the genes that are often shuttled between different strains. These mobile elements have different evolutionary histories than their hosts, and thus are of limited use for resolving their family relationships.

But even with these stable stretches of DNA, Leopold could not solve the evolutionary puzzle that is the E. coli family. She ended up with a different family tree depending on the segment of DNA that she analyzed. Sometimes group E appeared as an offshoot of group A for example, whereas it was located on one the main branches in other trees.

This is counter-intuitive for someone who believes stable species and strains. The only way to resolve this scenario is to accept that the strains of E. coli have mixed and matched (recombined) different portions their DNA in the past. Suppose that at some point, group A transferred some of its DNA to group E. Today, this piece of DNA will give the impression that group A and E are closer related to each other than the piece of DNA adjacent to it.

The complicated family ties suggest that recombination happened often and regularly in E. coli's past. Does it still happen as much today? The answer is no. Leopold saw that DNA is often shared by strains within the same group, and rarely between strains in different groups.

What has changed for E. coli, so that modern recombination is rare? Maybe it became more difficult to accept foreign DNA for the different strains, as they adapted to specific niches over time. If genetic transfer is slowing down, and maybe even coming to a halt, E. coli is becoming many species, instead of one.

This would not be the first time E. coli parted ways with its brothers. Almost 140 million years ago, Escherichia and Salmonella became two different species. What this single number hides is that there's a period of 70 million years before the two species truly became two. The oldest genes were separated 180 million years ago, whereas the youngest were still shared until 100 million years ago!

The niche-specific genes of Salmonella and Escherichia were the first to become locked in in their respective genomes. These genes could simply not be shared because they fulfilled a specific role in Escherichia or Salmonella. More general genes could still be recombined between the two strains/species however. It took millions of years before these sequences had diverged enough.

Nowadays the distinction between Salmonella and Escherichia is clear, but for millions of years this wasn't the case. The situation might be the same for E. coli now. Species diverge into strains, who might become species again. We just don't know it yet.


E. Coli with flagellum by E.H. White (Creative Commons licence)

Phylogenetic trees from publication 1.

Divergence of Escherichia and Salmonella genes from publication 3.


Leopold SR, Sawyer SA, Whittam TS, & Tarr PI (2011). Obscured Phylogeny and Recombinational Dormancy in Escherichia coli. BMC evolutionary biology, 11 (1) PMID: 21708031

Touchon M, Hoede C, Tenaillon O, Barbe V, Baeriswyl S, Bidet P, Bingen E, Bonacorsi S, Bouchier C, Bouvet O, Calteau A, Chiapello H, Clermont O, Cruveiller S, Danchin A, Diard M, Dossat C, Karoui ME, Frapy E, Garry L, Ghigo JM, Gilles AM, Johnson J, Le Bouguénec C, Lescat M, Mangenot S, Martinez-Jéhanne V, Matic I, Nassif X, Oztas S, Petit MA, Pichon C, Rouy Z, Ruf CS, Schneider D, Tourret J, Vacherie B, Vallenet D, Médigue C, Rocha EP, & Denamur E (2009). Organised genome dynamics in the Escherichia coli species results in highly diverse adaptive paths. PLoS genetics, 5 (1) PMID: 19165319

Retchless, A., & Lawrence, J. (2007). Temporal Fragmentation of Speciation in Bacteria Science, 317 (5841), 1093-1096 DOI: 10.1126/science.1144876

Do you want to know more?

I interviewed Shane Leopold and Phillip Tarr the BMC paper before writing this blogpost. Their joint reply sadly came in to late to incorporate into the post itself, but I still want to share them with you. Here are my questions and their replies.

Since some modern groups of E. coli did share DNA in the recent past, the authors also sketch a second future scenario, in which some groups of E. coli fuse (coalesce) into new groups.

In the introduction you write 'this [...] could reflect the end of E. coli as a species, or herald the coalescence of E. coli groups into new species'. Wouldn't a coalescence of different groups as nascent species also mean that E. coli as a descriptor for the entire species is invalid?

You raise a fascinating question. Take company X with five divisions A, B1, B2, D and E. Divisions A, B1, and E have similar or complementary product lines, use similar technologies, collaborate often, and share customers, and B2 and D also share between themselves to the same extent, but the product lines, technologies and customers have no overlap with those of divisions A, B1, and E. Company X then splits (coalesces its total) into two independent subsidiaries (A-B1-E and B2-D). The A-B1-E and B2-D subsidiaries continue not to work together; they make different products, and have different corporate structure and cultures. However, each has 'inherited' staff and equipment and customers from company X. Well, in this case, does company X exist in these new incarnations, or not? This is the coalescence scenario.

Or, if company X is composed of five divisions A, B1, B2, D and E. Divisions A, B1, and E have similar or complementary product lines, use similar technologies, collaborate often, and share customers, and B2 and D also share between themselves to the same extent. However, as time goes on, these interactions between A, B1 and E and between B2 and D diminish (as interactions between A, B1, E, B2 and D seem to have diminished in the past), and then interactions even within the divisions diminish. In that case, there is minimal interaction (recombination) seen for any of these employees (organisms) and there is scant fulfillment of the requisites of being a company (species). That is the pre-senescence scenario.

This question touches on the definition of a species, and how it is defined. There has been debate about that within the scientific community. Is a species a group of interacting organisms (freely exchanging DNA), or is it based upon genomic relatedness (70% DNA-DNA hybridization, or more recently, >95% nucleotide similarity). How do you define two groups of organisms that are highly similar but do not exchange DNA?

Do bacterial species exist at all?

It appears that E. coli have functioned as a species to the extent they freely exchanged DNA in the past. But, recently, maybe they haven’t. The applicability of the species concept has been questioned for bacteria, and we hope our data add to the debate. We also hope to introduce the possibility that a set of organisms that we define as a species is constantly evolving, and thus altering the cohesiveness of the group, maybe even changing so much that they no longer fit within the parameters of our species definition.

Will we ever be able to resolve our inability to 'foresee future evolution'? Could we have predicted 140 million years ago that salmonella and escherichia would become different species?

Fascinating question you ask: by delineating where sets of bacteria have been (phylogenetically), and localizing where these organisms are in the present, can we predict where these organisms will go in the future? Maybe yes - if we could define with experimental evolution models the frequency and kinds of chromosomal accommodation, and if we could then predict the selective pressures and bottlenecks that will occur, we might be able to formulate predictive models of future evolution. You and I will probably not be around to see if the models are validated in the wild, however.