According to the New York Times, synthetic biology is creating DNA out of thin air. A recent article about synthetic biology and consumer goods describes DNA synthesis as a process where "DNA is created on computers and inserted into organisms." Computers are pretty cool and really useful in synthetic biology labs, but it takes a lot more than a computer to turn a text file full of A's T's C's and G's into DNA.

I don't want to just nitpick science journalists here, because overextended analogies to computers and other "futuristic" devices are rampant in synthetic biology, hiding the realities of engineering biology with the language of bits and circuits. In 2010 Craig Venter announced that his research group had sequenced and re-synthesized the genome of a small bacterium, creating what he called “the first self-replicating species we’ve had on the planet whose parent is a computer.” Later, Venter announced DNA synthesis was like "3D printing" organisms, and that they were working on a "fax machine" to copy life on Mars.

These analogies make the flip-flop between living organism and DNA sequence files seem as simple as clicking "print." Sometimes it's actually described that way -- over at Grist Nathanael Johnson says that synthetic biology is about creating genes from scratch: "you type in the DNA that you want, print it out, and splice that into yeast." Indeed, for the average lab worker, ordering custom DNA sequences from a synthesis company has become a routine task: we enter sequences into an online form, and receive the DNA in the mail a few days later. But how do we figure out what DNA to type and what happens between clicking "order now" and receiving DNA in the mail? Where does DNA come from?

If you wish to make a DNA sequence "from scratch" you're probably not going to get your protein to fold. Very few researchers who fall under the umbrella of synthetic biology are actually working on designing genes completely "from scratch" or, as scientists would say, "de novo" -- not based on any existing genes found in other organisms. For a 100 amino acid peptide there are more possible sequences than there are atoms in the observable universe. Of those possible sequences, only a tiny fraction will be able to fold into a three dimensional protein shape, and of those a very tiny fraction will have a biological function. In the past two decades, the field has identified "several" de novo sequences using a combination of computational methods for predicting small-scale protein folding and combinatorial chemistry to select for peptides with any function at all.

Most synthetic biologists are not designing gene sequences this way; the field was founded on the idea that we shouldn't have to build new genetic engineering projects from scratch at all. Instead, synthetic biologists want to build new connections between genes from the library of existing genes that have been sequenced and characterized by other biologists and engineers. Synthetic biologists mix and match genes from different organisms, or alter parts of genes to change how the proteins are expressed or how they function in a living cell. This mix and match of DNA sequences can be built using the "traditional" genetic engineering tools of cutting and pasting DNA using enzymes, or outsourced to a synthesis company that will use chemistry to create the DNA molecule.

If you wish to make a DNA molecule from scratch, you must first create some atoms. DNA is an organic chemical molecule made from atoms of carbon, hydrogen, nitrogen, oxygen, and phosphorous. Like many other organic molecules that are made inside living cells, DNA can also by synthesized in test tubes using the tools of organic chemistry. In most descriptions of DNA synthesis technology, we hear that DNA sequences can be made by simply adding together the A's T's C's and G's -- the "bases" that make the rungs of the twisted DNA ladder. But where do those bases come from? What is the supply chain for manufacturing DNA?

Automated chemical synthesis of DNA begins with DNA bases that have been modified chemically to protect the highly reactive parts of the molecule from binding to each other and creating unwanted side products. These bases and their protecting groups are each made up from the combination of other molecules, each with their own series of chemical reactions, feedstocks, and supply chain economics. For example, the "base" part of adenine ("A") and guanine ("G") is a purine ring, which is chemically synthesized by heating formamide at 160-200 degrees Celsius. Formamide is produced through the reaction of carbon monoxide and ammonia. Ammonia is produced by heating nitrogen from the air to high temperatures under high pressure and mixing it with hydrogen, which is produced by burning natural gas, which is extracted from underground reserves by cracking rocks with high pressure liquids.

In synthetic biology, the physical reality of DNA as a chemical is analogous to the transistors that make up computer chips, its raw sequence of bases the "assembly code." These are layers that most programmers don't have to think about when they design software, just like most synthetic biologists don't necessarily think about how the DNA is made when they design metabolic pathways. But this abstraction in the engineering hierarchy doesn't mean that the lower levels aren't important or happen somehow on their own, and certainly not "from scratch."

The debates about the growing synthetic biology industry playing out in the pages of the New York Times and Grist have to do with knowing where the chemicals in our consumer products come from, who makes them and how. These articles are beginning to illuminate for a broader public how the processes of synthetic biology are becoming part of the enormously complex global supply chain of chemicals. While we're revealing the growing role of synthetic biology in industry, let's not define the field with language that hides how synthetic biology itself is made.