Tuesday, May 24, 2011

Junk & Jonathan: Part 6—Chapter 3

This is part 6 of my review of The Myth of Junk DNA. For a list of other postings on this topic see the link to Genomes & Junk DNA in the "theme box" below or in the sidebar under "Themes."

We learn in Chapter 9 that Wells has two categories of evidence against junk DNA. The first covers evidence that sequences probably have a function and the second covers specific known examples of functional sequences. In the first category there are two lines of evidence: transcription and conservation. Both of them are covered in Chapter 3 making this one of the most important chapters in the book. The remaining category of specific examples is described in Chapters 4-7.

The title of Chapter 3 is Most DNA Is Transcribed into RNA. As you might have anticipated, the focus of Wells' discussion is the ENCODE pilot project that detected abundant transcription in the 1% of the genome that they analyzed (ENCODE Project Consortium, 2007). Their results suggest that most of the genome is transcribed. Other studies support this idea and show that transcripts often overlap and many of them come from the opposite strand in a gene giving rise to antisense RNAs.

The original Nature paper says,
... our studies provide convincing evidence that the genome is pervasively transcribed, such that the majority of its bases can be found in primary transcripts, including non-protein-coding transcripts, and those that extensively overlap one another.
The authors of these studies firmly believe that evidence of transcription is evidence of function. This has even led some of them to propose a new definition of a gene [see What is a gene, post-ENCODE?]. There's no doubt that many molecular biologists take this data to mean that most of our genome has a function and that's the same point that Wells makes in his book. It's evidence against junk DNA.

What are these transcripts doing? Wells devotes a section to "Specific Functions of Non-Protein-Coding RNAs." These RNAs may be news to most readers but they are well known to biochemists and molecular biologists. This is not the place to describe all the known functional non-coding RNAs but keep in mind that there are three main categories: ribosomal RNA (rRNA), transfer RNA (tRNA), and a heterogeneous category called small RNAs. There are dozens of different kinds of small RNAs including unique ones such as the 7SL RNA of signal recognition factor, the P1 RNA of RNAse P and the guide RNA in telomerase. Other categories include the spliceosome RNAs, snoRNAs, piRNAs, siRNAs, and miRNAs. These RNAs have been studied for decades. It's important to note that the confirmed examples are transcribed from genes that make up less than 1% of the genome.

One interesting category is called "long noncoding RNAs" or lncRNAs. As the name implies, these RNAs are longer that the typical small RNAs. Their functions, if any, are largely unknown although a few have been characterized. If we add up all the genes for these RNAs and assume they are functional it will account for about 0.1% of the genome so this isn't an important category in the discussion about junk DNA.

Theme

Genomes
& Junk DNA
So, we're left with a puzzle. If more than 90% of the genome is transcribed but we only know about a small number of functional RNAs then what about the rest?

Opponents of junk DNA—both creationists and scientists—would have you believe that there's a lot we don't know about genomes and RNA. They believe that we will eventually find functions for all this RNA and prove that the DNA that produces them isn't junk. This is a genuine scientific controversy. What do their scientific opponents (I am one) say about the ENCODE result?

Criticisms of the ENCODE analysis take two forms ...
  • The data is wrong and only a small fraction of the genome is transcribed
  • The data is mostly correct but the transcription is spurious and accidental. Most of the products are junk RNA.
Criticisms of the Data

Several papers have appeared that call into question the techniques used by the ENCODE consortium. They claim that many of the identified transcribed regions are artifacts. This is especially true of the repetitive regions of the genome that make up more than half of the total content. If any one of these regions is transcribed then the transcript will likely hybridize to the remaining repeats giving a false impression of the amount of DNA that is actually transcribed.

Of course, Wells doesn't mention any of these criticisms in Chapter 3. In fact, he implies that every published paper is completely accurate in spite of the fact that most of them have never been replicated and many have been challenged by subsequent work. The readers of The Myth of Junk DNA will assume, intentionally or otherwise, that if a paper appears in the scientific literature it must be true.

But criticism of the ENCODE results are so widespread that they can't be ignored so Wells is forced to deal with them in Chapter 8. (Why not in Chapter 3 when they are first mentioned?) In particular, Wells has to address the van Bakel et al. (2010) paper from Tim Hughes' lab here in Toronto. This paper was widely discussed when it came out last year [see: Junk RNA or Imaginary RNA?]. We'll deal with it when I cover Chapter 9 but, suffice to say, Wells dismisses the criticism.

Criticisms of the Interpretation

The other form of criticism focuses on the interpretation of the data rather than its accuracy. Most of us who teach transcription take pains to point out to our students that RNA polymerase binds non-specifically to DNA and that much of this binding will result in spurious transcription at a very low frequency. This is exactly what we expect from a knowledge of transcription initiation [How RNA Polymerase Binds to DNA]. The ENCODE data shows that most of the genome is "transcribed" at a frequency of once every few generations (or days) and this is exactly what we expect from spurious transcription. The RNAs are non-functional accidents due to the sloppiness of the process [Useful RNAs?].

Wells doesn't mention any of this. I don't know if that's because he's ignorant of the basic biochemistry and hasn't read the papers or whether he is deliberately trying to mislead his readers. It's probably a bit of both.

It's not as if this is some secret known only to the experts. The possibility of spurious transcription has come up frequently in the scientific literature in the past few years. For example, Guttmann et al. (2009) write,
Genomic projects over the past decade have used shotgun sequencing and microarray hybridization to obtain evidence for many thousands of additional non-coding transcripts in mammals. Although the number of transcripts has grown, so too have the doubts as to whether most are biologically functional. The main concern was raised by the observation that most of the intergenic transcripts show little to no evolutionary conservation. Strictly speaking, the absence of evolutionary conservation cannot prove the absence of function. But the remarkably low rate of conservation seen in the current catalogues of large non-coding transcripts (less than 5% of cases) is unprecedented and would require that each mammalian clade evolves its own distinct repertoire of non-coding transcripts. Instead, the data suggest that the current catalogues may consist largely of transcriptional noise, with a minority of bona fide functional lincRNAs hidden amid this background.
This paper is in the Wells reference list so we know that he has read it.

What these authors are saying is that the data is consistent with spurious transcription (noise). Part of the evidence is the lack of any sequence conservation among the transcripts. It's as though they were mostly derived from junk DNA.

Sequence Conservation

Recall that the purpose of Chapter 3 is to show that junk DNA is probably functional. The first part of the chapter reportedly shows that most of our genome is transcribed. The second part addresses sequence conservation.

Here's what Wells says about sequence conservation.
Widespread transcription of non-protein-coding DNA suggests that the RNAs produced from such DNA might serve biological functions. Ironically, the suggestion that much non-protein-coding DNA might be functional also comes from evolutionary theory. If two lineages diverge from a common ancestor that possesses regions of non-protein-coding DNA, and these regions are really nonfunctional, then they will accumulate random mutations that are not weeded out by natural selection. Many generations later, the sequences of the corresponding non-protein-coding regions in the two descendant lineages will probably be very different. [Due to fixation by random genetic drift—LAM] On the other hand, if the original non-protein-coding DNA was functional, then natural selection will tend to weed out mutations affecting that function. Many generations later, the sequences of the corresponding non-protein-coding regions in the two descendant lineages will still be similar. (In evolutionary terminology, the sequences will be "conserved.") Turning the logic around, Darwinian theory implies that if evolutionarily divergent organisms share similar non-protein-coding DNA sequences, those sequences are probably functional.
Wells then references a few papers that have detected such conserved sequences, including the Guttmann et al. (2009) paper mentioned above. They found "over a thousand highly conserved large non-coding RNAs in mammals." Indeed they did, and this is strong evidence of function.1 Every biochemist and molecular biologist will agree. One thousand lncRNAs represent 0.08% of the genome. The sum total of all other conserved sequences is also less than 1%. Wells forgets to mention this in his book. He also forgets to mention the other point that Guttman et al. make; namely, that the lack of sequence conservation suggests that the vast majority of transcripts are non-functional. (Oops!)

There's irony here. We know that the sequences of junk DNA are not conserved and this is taken as evidence (not conclusive) that the DNA is non-functional. The genetic load argument makes the same point. We know that the vast majority of spurious RNA transcripts are also not conserved from species to species and this strongly suggests that those RNAs are not functional. Wells ignores this point entirely—it never comes up anywhere in his book. On the other hand, when a small percentage of DNA (and transcripts) are conserved, this gets prominent mention.

Wells doesn't believe in common ancestry so he doesn't believe that sequences are "conserved." (Presumably they reflect common design or something like that.) Nevertheless, when an evolutionary argument of conservation suits his purpose he's happy to invoke it, while, at the same time, ignoring the far more important argument about lack of conservation of the vast majority of spurious transcripts. Isn't that strange behavior?

The bottom line hear is that Jonathan Wells is correct to point to the ENCODE data as a problem for junk DNA proponents. This is part of the ongoing scientific controversy over the amount of junk in our genome. Where I fault Wells is his failure to explain to his readers that this is disputed data and interpretation. There's no slam-dunk case for function here. In fact, the tide seems to turning more and more against the original interpretation of the data. Most knowledgeable biochemists and molecular biologists do not believe that >90% of our genome is transcribed to produce functional RNAs.

UPDATE: How much of the genome do we expect to be transcribed on a regular basis? Protein-encoding genes account for about 30% of the genome, including introns (mostly junk). They will be transcribed. Other genes produce functional RNAs and together they cover about 3% of the genome. Thus, we expect that roughly a third of the genome will be transcribed at some time during development. We also expect that a lot more of the genome will be transcribed on rare occasions just because of spurious (accidental) transcription initiation. This doesn't count. Some pseudogenes, defective transposons, and endogenous retroviruses have retained the ability to be transcribed on a regular basis. This may account for another 1-2% of the genome. They produce junk RNA.


1. Conservation is not proof of function. In an effort to test this hypothesis Nöbrega et al. (2004) deleted two large regions of the mouse genome containing large numbers of sequences corresponding to conserved non-coding RNAs. They found that the mice with the deleted regions showed no phenotypic effects indicating that the DNA was junk. Jonathan Wells forgot to mention this experiment in his book.

Guttman, M. et al. (2009) Chromatin signature reveals over a thousand highly conserved non-coding RNAs in mammals. Nature 458:223-227. [NIH Public Access]

Nörega, M.A., Zhu, Y., Plajzer-Frick, I., Afzal, V. and Rubin, E.M. (2004) Megabase deletions of gene deserts result in viable mice. Nature 431:988-993. [Nature]

The ENCODE Project Consortium (2007) Nature 447:799-816. [PDF]

No comments:

Post a Comment