Bacteria are everywhere. In the air, in the soil, in our bodies and, thanks to human-built rockets, even in space. While the number of different bacterial strains and species discovered is continually increasing some bacteria, particularly environmental ones, are often very difficult to work with. These so-called 'unculturable' bacteria don't grow under laboratory conditions, making it impossible to characterise and understand them.

The only way to determine the presence of unculturable bacteria is by a process called whole genome sequencing. What this does is take a sample of (say) seawater and sequence all the DNA present inside it. Some of the DNA will be from culturable strains and these can be identified. Other bits of the DNA will be from no known species, from bacteria that can't grow in lab conditions.

This technique is very useful if you want to find out if any unculturable bacteria are present, but it doesn't tell you much about the unculturable bacteria themselves. From all the little bits of sequenced DNA in your sample you then have to assemble the complete genome of your bacteria. It's like assembling a jigsaw puzzle, but when thousands of different jigsaw pieces have been all mixed together, and a lot of them show very similar pictures.

By separating a single cell from the rest of the sample, the problem becomes simpler, but there is still a massive issue of scale. One cell does not contain an awful lot of DNA. With well-behaved bacteria the answer is to grow lots and lots of bacteria to increase the amount of DNA present, and this usually gives you around 95% of the genes (the other 5% are usually repeat sequences, analogous to a large section of jigsaw that is entirely sky and all the pieces are the same shape). From a single unculturable bacteria, however, it is often difficult to recover more than 70% of the genome.

Researchers from UC San Diego, the J. Craig Venter Institute and Illumina Inc. have been working on ways to increase this accuracy. The method they came up with (reference below) is called Multiple Displacement Amplification and works by amplifying sections of the genome before sending it off for sequencing. In this way the amount of DNA present can be increased without having to grow the cells in culture. MDA has been around since 2005, but has usually caused problems for sequencing programs as not all the DNA copies are accurate or uniform. Most sequencing programs deal badly with a sequence where some sections are amplified several billion times, while others are amplified less than twenty.

By changing the algorithms that work on the MDA data, the researchers managed to significantly improve the accuracy of the whole genome sequencing and MDA technique. To test how accurate it was, they let the algorithm loose on a single cell of previously-characterised E. coli, which was sequenced with 91% accuracy. This increased level of accuracy makes it a lot easier to characterise and understand the unculturable bacteria that exist in the environment, and to find out what exciting genetic secrets they could be concealing.


Chitsaz H, Yee-Greenbaum JL, Tesler G, Lombardo MJ, Dupont CL, Badger JH, Novotny M, Rusch DB, Fraser LJ, Gormley NA, Schulz-Trieglaff O, Smith GP, Evers DJ, Pevzner PA, & Lasken RS (2011). Efficient de novo assembly of single-cell bacterial genomes from short-read data sets. Nature biotechnology PMID: 21926975