Skip to main content

Why We Should Finish the Human Genome

About one percent of our DNA hasn't yet been mapped--and it could contain information crucial to our functioning and health

This article was published in Scientific American’s former blog network and reflects the views of the author, not necessarily those of Scientific American


The consortium of governments and institutions involved in the Human Genome Project should commit to finishing the finished genome. Our recent experiments indicate that these last bits contain potentially important genes and other elements important to our health.1

In April, 2003, the Human Genome was declared complete, but a close reading shows that the actual claim was that the genome is “complete in nearly every functional way” or “is as complete as it can be.”2 It is estimated that this means it’s 99 percent percent done, and the focus since 2003 has been on acquiring more genomes, not finishing the last one percent of ours. But what if this last one percent is “functional”, and we are also missing one percent of the estimated 20,000 to 25,000 genes,3 or 200+ genes? The most important reason to finish the genome is that any of these “missing” genes never get studied.

The reason why the last one percent has not been finished is because it is really hard. Mainly it is composed of repetitive sequences, like CAG-CAG-CAG-CAG-CAG-CAG...that go on for millions of DNA letters. This confuses the technology and software, something we have overcome. The reason the last one percent is important is because we found that in between all those hard repetitive sections lie undiscovered genes, some of which we have now found. All gene-containing sequences go into something called the "reference genome,"4 and that is what every scientist uses to study and try to find the genetic contributions to diseases or traits. If a gene is not in the reference it never gets studied. So our having discovered these will enable lots of others to study them. There are others who appreciate the need to truly finish the human genome, and some have succeeded in localizing (mapping) large clones to highly repetitive regions. Our approach bypasses the need for the mapping step, and goes directly to nextgen sequence.  


On supporting science journalism

If you're enjoying this article, consider supporting our award-winning journalism by subscribing. By purchasing a subscription you are helping to ensure the future of impactful stories about the discoveries and ideas shaping our world today.


And there is one thing about nature, it is efficient, so there is a reason why these repeat regions exist (even if we don’t know what it is yet), and there is a function for these genes (again, even if we don’t know what that function is yet, or what happens when the gene is broken). There still is a debate about the importance of the rest of this stuff, sometimes called “Junk DNA” or “Dark Matter DNA”. But I think the prevailing scientific opinion is that the rest of the genome, the intergenic (between gene) regions are important, and contain other important things, but we haven't figured out much about these regions.

It is likely that most of these missing pieces of the genome are in the center and at the ends of our chromosomes, called centromeres and telomeres.5,6 These regions play an important role in DNA replication (cell division) and gene protection. And they are known to be part of certain diseases. These telomeres are mainly made up of repetitive sequences, which scientists have noticed gets shorter as cells age, or in some cancers. So, many scientists think they are there for a buffer so that as the erosion occurs with age, that erosion does not destroy any genes. First, if there are genes in there, which we have now discovered, they may play a role in aging (and diseases of the aged). Second, to date all groups have studied one repeat sequence that is 6 letters long (GGGTTA), however, from our experiments, we found five-letter-long sequences (AATGG and GTGGA) that are 150 tunes more abundant than the often studied six-letter sequence. So, we think that perhaps a greater understanding of aging, and disease of the aged, like cancer, may be better explained now that we can look in the right place... again, these missing parts cannot be studied until they are “discovered,”

It is time to re-commit to finishing the genome. Sequencing technology has gotten better, and “tricky” experimental approaches, such as ours, can help. What if one of the undiscovered genes, is really important, but nobody is looking at it? The sooner we truly finish the human genome, the sooner we will be truly able to exploit it, for understanding and health.

 

References:

  1. Genomic leftovers: identifying novel microsatellites, over-represented motifs and functional elements in the human genome. Natalie C. Fonville, Karthik Raja Velmurugan, Hongseok Tae, Zalman Vaksman, Lauren J. McIver & Harold R. Garner, Scientific Reports 6, Article number: 27722 (2016), doi:10.1038/srep27722

  2. https://www.genome.gov/11006943/human-genome-project-completion-frequently-asked-questions/

  3. https://en.wikipedia.org/wiki/Human_genome

  4. https://en.wikipedia.org/wiki/Reference_genome

  5. https://en.wikipedia.org/wiki/Centromere

  6. https://en.wikipedia.org/wiki/Telomere