The End of “Junk DNA”?

By: Sean Pitman
.

Ever hear that only 2-5% of the human genome is functional?  that the all the rest is “junk DNA”? remnants of our long evolutionary heritage?  Ever wonder how we can really be 99% the same as an ape? 98% the same as a mouse? and 50% the same as a banana?  Well, it turns out that mainstream scientists are finally discovering what creationists have long claimed – that protein-coding genes are not the only functional elements in our genome.  Those vast regions of non-coding DNA actually do something.  In fact, it is being discovered that the protein-coding genes are arguably the most simple aspect of our genome, the basic bricks and mortar so to speak, while the information that directs how very similar bricks and mortar can be used to build anything from a banana to a mouse to you and me is where the really important information resides.

The science journal Nature just published a very interesting news feature along these lines (ENCODE: The human encyclopaedia, Sept 5, 2012).  This article reports on the ongoing human genome project called the “Encyclopedia of DNA Elements” or ENCODE project.  The scientists at ENCODE made a very startling, and very controversial, claim – that at least 80% of our genome is functional to one degree or another!

The consortium has assigned some sort of function to roughly 80% of the genome, including more than 70,000 ‘promoter’ regions — the sites, just upstream of genes, where proteins bind to control gene expression — and nearly 400,000 ‘enhancer’ regions that regulate expression of distant genes. But the job is far from done, says Birney, a computational biologist at the European Molecular Biology Laboratory’s European Bioinformatics Institute in Hinxton, UK, who coordinated the data analysis for ENCODE. He says that some of the mapping efforts are about halfway to completion, and that deeper characterization of everything the genome is doing is probably only 10% finished…

The vast desert regions have now been populated with hundreds of thousands of features that contribute to gene regulation. And every cell type uses different combinations and permutations of these features to generate its unique biology. This richness helps to explain how relatively few protein-coding genes can provide the biological complexity necessary to grow and run a human being…

With thousands of cell types to test and a growing set of tools with which to test them, the project could unfold endlessly. “We’re far from finished,” says geneticist Rick Myers of the HudsonAlpha Institute for Biotechnology in Huntsville, Alabama. “You might argue that this could go on forever.”

 

Of course, many scientists have responded rather strongly against this 80% functionality number (Link).  And, the truth of the matter is that, while non-coding DNA probably does represent the blue print for higher organisms, directing how protein-coding DNA functions to a significant degree, this does not necessarily or even likely mean that most non-coding DNA is actually required – or even useful.  After all, some ferns and salamander have genomes the same size or smaller than the human genome, and other ferns and salamander have genomes 50 times the size of the human genome (the human genome is comprise of ~3.5 billion bases). For additional comparisons, consider that a chicken’s genome contains about 1.3 billion bases, a clam about 3.2 billion, some frogs have 6.5 billion, and a lady bug genome has about 0.3 billion (~300 million) bases – similar to the genome of a Japanese pufferfish which is 8 times smaller than the human genome (just 385 million base pairs compared to 3 billion base pairs for humans), yet does just fine.  It is clearly impossible to guess the genome size of an organism just by looking at its apparent “complexity” or “simplicity” or the number of protein-coding “genes” in the genome – which doesn’t seem to correlate with the overall size of eukaryotic genomes.  This curious fact is currently known as the C-value enigma.C-valueGiven the reality of the C-value enigma, it seems likely that most non-coding DNA may not be vital for life or even beneficially functional.  For example, consider the argument of Dr. Ryan Gregory known as  “The Onion Test“, with additional commentary from Dr. Larry Moran (Link):

The onion test is a simple reality check for anyone who thinks they have come up with a universal function for non-coding DNA. Whatever your proposed function, ask yourself this question: Can I explain why an onion needs about five times more non-coding DNA for this function than a human? The onion, Allium cepa, is a diploid (2n = 16) plant with a haploid genome size of about 17 pg. Human, Homo sapiens, is a diploid (2n = 46) animal with a haploid genome size of about 3.5 pg. This comparison is chosen more or less arbitrarily (there are far bigger genomes than onion, and far smaller ones than human), but it makes the problem of universal function for non-coding DNA clear. Further, if you think perhaps onions are somehow special, consider that members of the genus Allium range in genome size from 7 pg to 31.5 pg. So why can A. altyncolicum make do with one fifth as much regulation, structural maintenance, protection against mutagens, or [insert preferred universal function] as A. ursinum?

However, there is the problem of the expense of maintaining stretches of DNA for long periods of time that don’t provide any useful advantage to the organism.  Such maintenance might seem to be fairly expensive if there is no return on the investment.

Nick_MatzkeThe counter argument, as presented by Dr. Nick Matzke, is that such maintenance really isn’t very expensive at all relative to the other costs that the organism must pay.  What’s a few pennies here and there when you’re spending thousands of dollars every day?  Well, over millions of years a few pennies here and there might seem to add up to quite a lot.  And, in a dog-eat-dog world, this could make all the difference. To add to the credibility of this observation, consider a paper by Holloway et al. (2007) where the authors observed a significant survivability cost disadvantage in various environments for bacteria that carried extra non-beneficial copies of DNA (Link).  However, the argument is that the cost is much higher for single-celled organisms compared to multi-celled organisms – like most eukaryotes. In this line, consider two papers published in 1980 by Orgel and Crick (Link) and by Doolittle and Sapienza (Link) which argue that “selfish DNA” elements, such as transposons, essentially act as molecular parasites, replicating and increasing their numbers at the relatively slight expense of a host genome – so slight that natural selection simply can’t keep up with the rate of expansion of these self-replicating elements within the genome.

However, a few scientists have suggested various functional options that might help to explain, to at least some degree, the C-value enigma.  For example, non-coding DNA seems to act as a sort of clock to regulate the timing of expression of various genes and genetic elements during development (Swinburne, 2010). There is also the interesting discovery that the initiation of DNA replication and the transition from G1 to S is dependent upon nuclear volume.  “Replication appears to initiate and terminate at the nuclear periphery and require a critical nuclear volume for onset (Nicolini et al., 1986); G1 nuclear volume growth must depend on concerted expansion of both chromatin and the nuclear envelope.” (Cavalier-Smith, 2004)  In short, “A genome’s sheer bulk can influence the rate of cell division and thereby that of development.” (Link)

Of course, the obvious counter is that such functionality for repetitive DNA is not dependent on the nature of the sequence itself, but only upon the absolute size of the sequence.  And, while this appears to be true, having the right size in just the right place can obviously be quite beneficial.  In other words, on occasion, size does matter…

This isn’t all, of course.  There are times when the actual specificity of the sequence matters as well – and this is what has also been discovered about non-coding DNA.  It appears to be the blueprint that controls how the building blocks (i.e., the protein-coding genes) are used.  In other words, non-coding DNA does seem to be more important than the protein-coding genes themselves. It seems, for instance, that it is the non-coding DNA that determines if a mouse or a pig or a monkey or a human is to be built given a set of very similar protein-coding genes for each of these types of creatures (for further discussion of this topic see: Link).

“I think this will come to be a classic story of orthodoxy derailing objective analysis of the facts, in this case for a quarter of a century,” Mattick says. “The failure to recognize the full implications of this particularly the possibility that the intervening noncoding sequences may be transmitting parallel information in the form of RNA molecules – may well go down as one of the biggest mistakes in the history of molecular biology.”

Wayt T. Gibbs, “The Unseen Genome: Gems Among the Junk,” Scientific American (Nov. 2003).

 

A genome where the non-coding DNA controls the coding DNA has important implications for evolutionary theories.  Not the least of the implications this discovery highlights is the fact that the detrimental mutation rate suffered by all slowly reproducing gene pools (like all mammals for instance) is higher than anyone within mainstream science has ever imagined before – far far higher than any known naturalistic mechanism can deal with when it comes to avoiding the inevitable deterioration of our entire gene pool over time toward eventual genetic meltdown and extinction.  This only confirms what those like geneticist John Sanford have been saying for a some time now – that we’re not evolving, but devolving (Link). It also highlights the fact that the functional differences between us and other animals, like apes, are found, not so much in the protein-coding genes, but in the non-coding regions of our genomes (Link).

The very best that any naturalistic evolutionary mechanism can achieve is a moderate slow down in the inevitable deterioration of the quality of information within our gene pool and the gene pools of all other slowly reproducing species.  It certainly cannot generate novel genetic information beyond very low levels of functional complexity and it can’t even get rid of detrimental mutations that enter the gene pool fast enough to stay functionally neutral – regardless of the degree of selection pressure applied.  I’m afraid, it’s turtles all the way down… and it always has been since the very beginning of life on this planet.