corneliabl: Junk RNA or Imaginary RNA?

RNA is very popular these days. It seems as though new varieties of RNA are being discovered just about every month. There have been breathless reports claiming that almost all of our genome is transcribed and most of the this RNA has to be functional even though we don't yet know what the function is. The fervor with which some people advocate a paradigm shift in thinking about RNA approaches that of a cult follower [see Greg Laden Gets Suckered by John Mattick].

We've known for decades that there are many types of RNA besides messenger RNA (mRNA encodes proteins). Besides the standard ribosomal RNAs and transfer RNAs (tRNAs), there are a variety of small RNAs required for splicing and many other functions. There's no doubt that some of the new discoveries are important as well. This is especially true of small regulatory RNAs.

However, the idea that a huge proportion of our genome could be devoted to synthesizing functional RNAs does not fit with the data showing that most of our genome is junk [see Shoddy But Not "Junk"?]. That hasn't stopped RNA cultists from promoting experiments leading to the conclusion that almost all of our genome is transcribed.

Late to the Party

Several people have already written about this paper including Carl Zimmer and PZ Myers. There are also summaries in Nature News and PLoS Biology.That may change. A paper just published in PLoS Biology shows that the earlier work was prone to artifacts. Some of those RNAs may not even be there and others are present in tiny amounts.

The work was done by Harm van Bakel in Tim Hughes' lab, right here in Toronto. It's only a few floors, and a bridge, from where I'm sitting right now. The title of their paper tries to put a positive spin on the results: "Most 'Dark Matter' Transcripts Are Associated With Known Genes" [van Bakel et. al. (2010)]. Nobody's buying that spin. They all recognize that the important result is not that non-coding RNAs are mostly associated with genes but the fact that they are not found in the rest of the genome. In other words, most of our genome is not transcribed in spite of what was said in earlier papers.

Van Bekal compared two different types of analysis. The first, called "tiling arrays," is a technique where bulk RNA (cDNA, actually) is hybridized to a series of probes on a microchip. The probes are short pieces of DNA corresponding to genomic sequences spaced every few thousand base pairs along each chromosome. When some RNA fragment hybridizes to one of these probes you score that as a "hit." The earlier experiments used this technique and the results indicated that almost every probe could hybridize an RNA fragment. Thus, as you scanned the chip you saw that almost every spot recorded a "hit." The conclusion is that almost all of the genome is transcribed even though only 2% corresponds to known genes.

The second type of analysis is called RNA-Seq and it relies on direct sequencing of RNA fragments. Basically, you copy the RNA into DNA, selecting for small 200 bp fragments. Using new sequencing technology, you then determine the sequence of one (single end) or both ends (paired end) of this cDNA. You may only get 30 bp of good sequence information but that's sufficient to place the transcript on the known genome sequence. By collecting millions of sequence reads, you can determine what parts of the genome are transcribed and you can also determine the frequency of transcription. The technique is much more quantitative than tiling experiments.

Van Bekel et al. show that using RNA-Seq they detect very little transcription from the regions between genes. On the other hand, using tiling arrays they detect much more transcription from these regions. They conclude that the tiling arrays are producing spurious results—possibly due to cross-hybridization or possibly due to detection of very low abundance transcripts. In other words, the conclusion that most of our genome is transcribed may be an artifact of the method.

The parts of the genome that are presumed to be transcribed but for which there is no function is called "dark matter." Here's the important finding in the author's own words.

To investigate the extent and nature of transcriptional dark matter, we have analyzed a diverse set of human and mouse tissues and cell lines using tiling microarrays and RNA-Seq. A meta-analysis of single- and paired-end read RNA-Seq data reveals that the proportion of transcripts originating from intergenic and intronic regions is much lower than identified by whole-genome tiling arrays, which appear to suffer from high false-positive rates for transcripts expressed at low levels.

Many of us dismissed the earlier results as transcriptional noise or "junk RNA." We thought that much of the genome could be transcribed at a very low level but this was mostly due to accidental transcription from spurious promoters. This low level of "accidental" transcription is perfectly consistent with what we know about RNA polymerase and DNA binding proteins [What is a gene, post-ENCODE?, How RNA Polymerase Binds to DNA]. Although we might have suspected that some of the "transcription" was a true artifact, it was difficult to see how the papers could have failed to consider such a possibility. They had been through peer review and the reviewers seemed to be satisfied with the data and the interpretation.

That's gonna change. I suspect that from now on everybody is going to ignore the tiling array experiments and pretend they don't exist. Not only that, but in light of recent results, I suspect more and more scientists will announce that they never believed the earlier results in the first place. Too bad they never said that in print.

van Bakel, H., Nislow, C., Blencowe, B. and Hughes, T. (2010) Most "Dark Matter" Transcripts Are Associated With Known Genes. PLoS Biology 8: e1000371 [doi:10.1371/journal.pbio.1000371]

corneliabl

Thursday, May 20, 2010

Junk RNA or Imaginary RNA?

No comments:

Post a Comment