How much of the overall human genome is actually functional? Well, this has been a matter of hot debate since 2012 when the ENCODE scientists announced their estimate of ~80% functionality for the human genome (ENCODE: The human encyclopaedia, Sept 5, 2012). This initial estimate was strongly challenged and even mocked by numerous scientists (Link). Then, a couple years later in researchers at the University of Oxford, UK, concluded that only about 8.2% of the human genome is shaped by natural selection. The rest, they argued, is non-functional (Rands, 2014). While the 80% figure did seem high, the 8% figure seems a bit low to a number of scientists as well (Link).
Patrik D’haeseleer, a computational biologist at Lawrence Livermore National Laboratory, California, tweeted “only between 8% and 80% of human #genome is functional. Glad we’ve got that sorted out.” Of course, at the heart of the issue are differing definitions of “function.” Erick Loomis, an epigeneticist at Imperial College London, tweeted: Maybe we should stop using ‘functional’ if we can’t find a common definition.” (Link).
Dr. John Greally, an epigeneticist at the Albert Einstein College of Medicine of Yeshiva University in New York City, argued that the Rands et. al. paper “missed an opportunity to explore why certain sequences — especially those known as transcription factor binding sites — are under such low evolutionary pressure, even though they presumably have important biological roles. Instead, the authors emphasized the supposed discrepancy with ENCODE. The paper appears to be in use as a bludgeon with which to hammer the ENCODE project, not necessarily by the authors, but by others.” (Link)
This is all very interesting because it has been known for some time, as Dr. Greally points out, that non-conserved sequences (based on assumed evolutionary relationships) can still be functional. For example, Kellis (2014) argues that:
The lower bound estimate that 5% of the human genome has been under evolutionary constraint was based on the excess conservation observed in mammalian alignments relative to a neutral reference (typically ancestral repeats, small introns, or fourfold degenerate codon positions). However, estimates that incorporate alternate references, shape-based constraint, evolutionary turnover, or lineage-specific constraint each suggests roughly two to three times more constraint than previously (12-15%), and their union might be even larger as they each correct different aspects of alignment-based excess constraint…. Although still weakly powered, human population studies suggest that an additional 4-11% of the genome may be under lineage-specific constraint after specifically excluding protein-coding regions.
This means that, at minimum, between 16% to 26% of the genome is likely to be functionally constrained to one degree or another. And, of course, this means that the likely detrimental mutation rate is at least four times as high as the Ud = 2.2 rate that Keightley suggested in 2012 (and some would argue even higher) – i.e., at least 8.8 detrimental mutations per offspring per generation. This would, of course, imply a necessary reproductive rate of over 13,200 offspring per woman per generation just for natural selection to keep up with and effectively eliminate all these detrimental mutations (and a necessary death rate of over 99.99% per generation). (Link).
So, no wonder this is such a hot button issue for neo-Darwinists. A whole lot is riding on how much of the human genome is actually functional. And, who knows, it may turn out that the “c-value enigma” and the answer to why many functional elements within the human genome are not significantly constrained by natural selection is due, at least in part, to various forms of redundancy of functional elements within the genome. In other words, while various sections of non-coding DNA may be functional, there may also be other redundant copies of these functional sequences or various complex networks capable of maintaining a given function within the genome. This would mean, of course, that any one particular functional sequence within such a redundant genome could sustain numerous mutations without natural selection taking significant notice. Not until the functional redundancy was depleted within the genome would there be a significant functional deficit for the organism and a need for natural selection to step up to the plate.
Of course, this argument isn’t a new argument. In fact, this very same argument was used by Kellis et. al. in 2014:
The approach [where functionality is only based on homologous or “constrained” sequences between various species or tests that measure immediate “loss of function” tests – as in, for example, “knock out mice” where various genetic segments are deleted from the mouse genome] may also miss elements whose phenotypes occur only in rare cells or specific environmental contexts, or whose effects are too subtle to detect with current assays. Loss-of-function tests can also be buffered by functional redundancy, such that double or triple disruptions are required for a phenotypic consequence. Consistent with redundant, contextual, or subtle functions, the deletion of large and highly conserved genomic segments sometimes has no discernible organismal phenotype and seemingly debilitating mutations in genes thought to be indispensible have been found in the human population.
Clearly then, this observation significantly undermines the impact of the conclusions of Rands et. al., because Rands’ conclusion of overall human genomic functionality of just 8.2% is based entirely on “constrained” sequence homologies between different mammalian species. Rands doesn’t take into account the possibility of functional aspects of DNA that would not be significantly constrained between or even within various species. In fact, it’s been known in a general way for some time now that there is a lot of redundancy in the human genome since most genes and other functional genetic elements have at least two copies within the genome – with some having several dozen or even several hundred copies. The human genome is in fact a very “repetitive landscape.” Of course, some biologists considered the repetition either superfluous or sort of a “backup supply” of DNA. While it is true that some of the repetition within the human genome could just be ‘extra DNA’, new research is also suggesting that such redundant sequences may have a variety of more direct functional roles (Link, Link). Genetic redundancy is the key to the robustness of organisms – i.e., their built-in flexibility to rapidly adapt to different environments. It is also right in line with very good design. Consider, for example, the arguments of David Stern (HHMI investigator) in this regard:
Over the past 10 to 20 years, research has shown that instructional regions outside the protein-coding region are important for regulating when genes are turned on and off. Now we’re finding that additional copies of these genetic instructions are important for maintaining stable gene function even in a variable environment, so that genes produce the right output for organisms to develop normally. (Frankel, et. al., 2010).
For example, in 2008, the University of California-Berkeley’s Michael Levine reported the discovery of secondary enhancers for a particular fruit fly gene that were located much farther away from the target genes and from the previously discovered enhancers that were located adjacent to the gene. “Levine’s team called the apparently redundant copies in distant genetic realms “shadow enhancers” and hypothesized that they might serve to make sure that genes are expressed normally, even if development is disturbed. Factors that might induce developmental disturbances include environmental conditions, such as extreme temperatures, and internal factors, such as mutations in other genes.”
So, Stern and his team put Levine’s hypothesis to the test by studying a fruit fly gene that codes for the production of tiny hair-like projections on the insect’s body, which are called trichomes. “The gene, known as shavenbaby, takes its name from the fact that flies with a mutated copy of the gene are nearly hairless. Stern previously led a research effort that identified three primary enhancers for shavenbaby. In the new research, his team discovered two shadow enhancers for shavenbaby, located more than 50,000 base pairs away from the gene.
In their experiments, the researchers deleted these two shadow enhancers, leaving the primary enhancers in place, and observed developing fly embryos under a range of temperature conditions. At optimal temperatures for fruit fly development — around 25 degrees Celsius, or a comfortable 77 degrees Fahrenheit — the embryos without shadow enhancers had only very slight defects in their trichomes. But the results were very different when the researchers observed embryos that developed at temperatures close to the extremes at which developing fruit flies can survive — 17 degrees Celsius, or 63 degrees Fahrenheit, on the low end and 32 degrees Celsius, or 90 degrees Fahrenheit, at the upper limit. These flies without shadow enhancers developed with severe deficiencies in the number of trichomes produced.” (Link)
These results indicate that the genetic instructions that seemed dependable at optimal temperatures were just not up to the task in other conditions, Stern said. (Link)
Backup regulatory DNAs, also called shadow enhancers, ensure the reliable activities of essential genes such as shavenbaby even under adverse conditions, such as increases in temperature,” Levine said. “If Dr. Stern and his associates had not examined the activities of shavenbaby under such conditions, then the shadow enhancers might have been missed since they are not needed when fruit flies are grown at optimal culturing conditions in the laboratory. (Link)
Simply by knocking genes out we don’t necessarily reveal function, because the network may buffer what is happening. So you may need to do two knockouts or even three before you finally get through to the phenotype. … If one network doesn’t succeed in producing a component necessary to the functioning of the cell and the organism, then another network is used instead. So most knockouts and mutations are buffered by the network… Now that doesn’t mean to say that these proteins that are made as a consequence of gene templates for them don’t have a function. Of course they do. If you stress the organism you can reveal the function. .. If the organism can’t make product X by mechanism A, it makes it by mechanism B. (Link)
Of course, the very existence of genetic buffering, and the functional redundancies required for it, presents a paradox in light of the evolutionary concepts. On one hand, for genetic buffering to take place there is a necessity for redundancies of gene function. Yet, on the other hand, such redundancies are clearly unstable in the face of natural selection and are therefore unlikely to be found in extensively evolved genomes (Link). Why then does so much genetic buffering continue to exist within the human genome? – if natural selection does in fact destroy such buffering redundancy over relatively short periods of time? Yet, extensive redundancy or genetic “buffering” does still exist within the human genome. And this is only the tip of the iceberg. “The study of DNA and genetics is beginning to resemble particle physics. Scientists continually find new layers of organization and ever more detailed relationships.” (Link).
There is also evidence that up to 30% of the RNA transcripts that are produced by DNA in various creatures are “conserved” in their structure even if they are not conserved in their sequence. Martin Smith et. al. (2013), reported:
When applied to consistency-based multiple genome alignments of 35 [placental and marsupial, including including bats, mice, pigs, cows, dolphins and human] mammals, our approach confidently identifies >4 million evolutionarily constrained RNA structures using a conservative sensitivity threshold… These predictions comprise 13.6% of the human genome, 88% of which fall outside any known sequence-constrained element, suggesting that a large proportion of the mammalian genome is functional…
Our findings provide an additional layer of support for previous reports advancing that >20% of the human genome is subjected to evolutionary selection while suggesting that additional evidence for function can be uncovered through careful investigation of analytically involute higher-order RNA structures.
The RNA structure predictions we report using conservative thresholds are likely to span >13.6% of the human genome we report. This number is probably a substantial underestimate of the true proportion given the conservative scoring thresholds employed, the neglect of pseudoknots, the liberal distance between overlapping windows and the incapacity of the sliding-window approach to detect base-pair interactions outside the fixed window length. A less conservative estimate would place this ratio somewhere above 20% from the reported sensitivities measured from native RFAM alignments and over 30% from the observed sensitivities derived from sequence-based realignment of RFAM data. (see also: Link)
But how could RNA secondary structure be conserved if the DNA encoding for it is not? Analysis of the fitness effect of compensatory mutations (HSFP J, 2009) may explain:
It is well known that the folding of RNA molecules into the stem-loop structure requires base pair matching in the stem part of the molecule, and mutations occurring to one segment of the stem part will disrupt the matching, and therefore, have a deleterious effect on the folding and stability of the molecule. It has been observed that mutations in the complementary segment can rescind the deleterious effect by mutating into a base pair that matches the already mutated base, thus recovering the fitness of the original molecule (Kelley et al., 2000; Wilke et al., 2003).
As the diagram helps illustrate, if a base on the top RNA sequence mutates, a base on the bottom can also mutate to match it again, which maintains or “conserves” the same secondary structure – even though the sequence itself is no longer conserved.
Considering such discoveries, it is very likely that the functionality of the human genome is well over 8.2% – perhaps to the point where the ENCODE scientists weren’t so crazy after all?