Wednesday, April 22, 2009

The Trouble with NCSE

 
I am a member of National Center for Science Education (NCSE). That does not mean I agree with everything they do. My biggest bone of contention is over their accommodationist tactics [National Academies: Science, Evolution and Creationism, Appeasers, Spaghetti Monsters, and NCSE].

I don't like the fact that NCSE cozies up to theistic evolutionists like Ken Miller and Francis Collins while, at the same time, actively distancing itself from vocal atheist scientists like Richard Dawkins. I think NCSE shouldn't takes side and shouldn't promote the idea that science and religion are compatible.

Jerry Coyne agrees. He has published a lengthy essay on his blog where he takes NCSE to task [Truckling to the Faithful: A Spoonful of Jesus Helps Darwin Go Down]. I have warned many people at NCSE that they risk losing the support of non-theist scientists but, for the most part, they think the risk is worth it.

I wonder if they still think that way?


Tuesday, April 21, 2009

L. CANFORA, Artemidorus Ephesius. P.Artemid. sive Artemidorus personatus,


Artemidorus Ephesius. P.Artemid. sive Artemidorus personatus,
edidit brevique commentario instruxit Societas emunctae naris, Bari, Edizioni di Pagina, aprile 2009.

Titolo: Artemidorus Ephesius. P. Artemid. sive Artemidorus personatus
Curato da: Canfora L.
Editore: Edizioni di Pagina
Data di Pubblicazione: 2009
Collana: Ekdosis
ISBN: 887470089X
ISBN-13: 9788874700899
Pagine: 52
Reparto: Letteratura classica
INDEX

Praefatio

Quid fuerit Artemidori Geographia et quomodo eam Marcianus in epitomen reduxerit 2

Siglorum conspectus, p. 6

Artemidorus personatus

Col. I, p. 8 - Col. II, p. 14 - Col. III, p. 17 - Col. IV, p. 18 - Col. V, p. 24

De personati nostri usu scribendi, p. 33 - Summatim dictum, p. 34

Artemidori Geographikon Liber I edidit Claudius Schiano 35

Appendix. Artemidori Hispania in Strabonis opere 49

Editiones quas adiimus 52

How to Evaluate Genome Level Transcription Papers

It's often very difficult to evaluate the results of large-scale genome studies. Part of the problem is that the technology is complicated and the controls are not obvious. Part of the problem is that the results depend a great deal on the software used to analyze the data and the limitations of the software are often not described.

But those aren't the only problems. We also have to take into consideration the biases of the people who write the papers. Some of those biases are the same ones we see in other situations except that they are less obvious in the case of large-scale genome studies.

Laurence Hurst has written up a nice summary of the problem and I'd like to quote from his recent paper (Hurst, 2009).
In the 1970s and 80s there was a large school of evolutionary biology, much of it focused on understanding animal behavior, that to a first approximation assumed that whatever trait was being looked at was the product of selection. Richard Dawkins is probably the most widely known advocate for this school of thought, John Maynard Smith and Bill (WD) Hamilton its main proponents. The game played in this field was one in which ever more ingenious selectionist hypotheses would be put forward and tested. The possibility that selection might not be the answer was given short shrift.

By contrast, during the same period non-selectionist theories were gaining ground as the explanatory principle for details seen at the molecular level. According to these models, chance plays an important part in determining the fate of a new mutation – whether it is lost or spreads through a population. Just as a neutrally buoyant particle of gas has an equal probability of diffusing up or down, so too in Motoo Kimura's neutral theory of molecular evolution an allele with no selective consequences can go up or down in frequency, and sometimes replace all other versions in the population (that is, it reaches fixation). An important extension of the neutral theory (the nearly-neutral theory) considers alleles that can be weakly deleterious or weakly advantageous. The important difference between the two theories is that in a very large population a very weakly deleterious allele is unlikely to reach fixation, as selection is given enough opportunity to weed out alleles of very small deleterious effects. By contrast, in a very small population a few chance events increasing the frequency of an allele can be enough for fixation. More generally then, in large populations the odds are stacked against weakly deleterious mutations and so selection should be more efficient in large populations.

In this framework, mutations in protein-coding genes that are synonymous – that is, that replace one codon with another specifying the same amino acid and, therefore, do not affect the protein – or mutations in the DNA between genes (intergene spacers) are assumed to be unaffected by selection. Until recently, a neutralist position has dominated thinking at the genomic/molecular level. This is indeed reflected in the use of the term 'junk DNA' to describe intergene spacer DNA.

These two schools of thought then could not be more antithetical. And this is where genome evolution comes in. The big question for me is just what is the reach of selection. There is little argument about selection as the best explanation for gross features of organismic anatomy. But what about more subtle changes in genomes? Population genetics theory can tell you that, in principle, selection will be limited when the population comprises few individuals and when the strength of selection against a deleterious mutation is small. But none of this actually tells you what the reach of selection is, as a priori we do not know what the likely selective impact of any given mutation will be, not least because we cannot always know the consequences of apparently innocuous changes. The issue then becomes empirical, and genome evolution provides a plethora of possible test cases. In examining these cases we can hope to uncover not just what mutations selection is interested in, but also to discover why, and in turn to understand how genomes work. Central to the issue is whether our genome is an exquisite adaption or a noisy error-prone mess.
Sandwalk readers will be familiar with this problem. In the context of genome studies, the adaptationist approach is most often reflected as a bias in favor of treating all observations as evidence of functionality. It you detect it, then it must have been selected. If it was selected, it must be important.

As Hurst points out, the real question in evaluating genome studies boils down to a choice between an exquisitely adapted genome or one that is messy and full of mistakes. The battlefields are studies on the frequency of alternative splicing, transcription, the importance of small RNAs, and binding sites for regulatory proteins.

Let's take transcription studies as an example.
Consider, for example, the problem of transcription. Although maybe only 5% of the human genome comprises genes encoding proteins, the great majority of the DNA in our genome is transcribed into RNA [1]. In this the human genome is not unusual. But is all this transcription functionally important? The selectionist model would propose that the transcription is physiologically relevant. Maybe the transcripts specify previously unrecognized proteins. If not, perhaps the transcripts are involved in RNA-level regulation of other genes. Or the process of transcription may be important in keeping the DNA in a configuration that enables or suppresses transcription from closely linked sites.

The alternative model suggests that all this excess transcription is unavoidable noise resulting from promiscuity of transcription-factor binding. A solid defense can be given for this. If you take 100 random base pairs of DNA and ask what proportion of the sequence matches some transcription factor binding site in the human genome, you find that upwards of 50% of the random sequence is potentially bound by transcription factors and that there are, on average, 15 such binding sites per 100 nucleotides. This may just reflect our poor understanding of transcription factor binding sites, but it could also mean that our genome is mostly transcription factor binding site. If so, transcription everywhere in the genome is just so much noise that the genome must cope with.
There is no definitive solution to this conflict. Both sides have passionate advocates and right now you can't choose one over the other. My own bias is that most of the transcription is just noise—it is not biologically relevant.

That's not the point, however. The point is that as a reader of the scientific literature you have to make up your mind whether the data and the interpretation are believable.

Here's two criteria that I use to evaluate a paper on genome level transcription.
  1. I look to see whether the authors are aware of the adaptation vs noise controversy. If they completely ignore the possibility that what they are looking at could be transcriptional noise, then I tend to dismiss the paper. It is not good science to ignore alternative hypotheses. Furthermore, such papers will hardly ever have controls or experiments that attempt to falsify the adaptationist interpretation. That's because they are unaware of the fact that a controversy exists.1
  2. Does the paper have details about the abundance of individual transcripts? If the paper is making the case for functional significance then one of the important bits of evidence is reporting on the abundance of the rare transcripts. If the authors omit this bit of information, or skim over it quickly, then you should be suspicious. Many of these rare transcripts are present in less that one or two copies per cell and that's perfectly consistent with transcriptional noise—even if it's only one cell type that's expressing the RNA. There aren't many functional roles for an RNA whose concentration is in the nanomole range. Critical thinkers will have thought about the problem and be prepared to address it head-on.


1. Or, maybe they know there's a controversy but they don't want you to be thinking about it as you read their paper. Or, maybe they think the issue has been settled and the "messy" genome advocates have been routed. Either way, these are not authors you should trust.

Hurst, L.D. (2009) Evolutionary genomics and the reach of selection. Journal of Biology 8:12 [DOI:10.1186/jbiol113]

Monday's Molecule #118: Winners

 
UPDATE: The molecule is cyclin-dependent kinase 2 (CDK2), a protein involved in signaling [PDB 1b38]. The Nobel Laureate is Paul Nurse.

This week's winners are Mike Fraser of Toronto and Alex Ling of the University of Toronto.


This is a very famous protein but most of you won't be able to identify it from the structure alone. You'll need a hint of some sort.

Letting you know that the ligands are Mg2+ and adenosine-5′-triphosphate might not be enough so I'll also tell you that one of the authors on the structure paper was M.E. Noble.

There is one Nobel Laureate who is most closely identified with the function of this particular molecule, although that scientist was NOT the first to identify it. You have to identify the Nobel Laureate who got the prize for working out the function of the protein.

The first person to identify the molecule and the Nobel Laureate wins a free lunch at the Faculty Club. Previous winners are ineligible for one month from the time they first won the prize.

There are six ineligible candidates for this week's reward: Bill Chaney of the University of Nebraska, Elvis Cela from the University of Toronto, Peter Horwich from Dalhousie University, Devin Trudeau from the University of Toronto, Shumona De of Dalhousie University, and Maria Altshuler of the University of Toronto.

I note that Canadians are trouncing the rest of the world. That's as it should be.

I still have one extra free lunch donated by a previous winner to a deserving undergraduate so I'm going to continue to award an additional free lunch to the first undergraduate student who can accept it. Please indicate in your email message whether you are an undergraduate and whether you can make it for lunch.

THEME:

Nobel Laureates
Send your guess to Sandwalk (sandwalk (at) bioinfo.med.utoronto.ca) and I'll pick the first email message that correctly identifies the molecule and names the Nobel Laureate(s). Note that I'm not going to repeat Nobel Prizes so you might want to check the list of previous Sandwalk postings by clicking on the link in the theme box.

Correct responses will be posted tomorrow.

Comments will be blocked for 24 hours. Comments are now open.



Sequenced genomes contain thousands of "unknown" genes

 
The total number of genes in the human genome has dropped from the initial estimates of 30-35,000 to about 25,000. Of these, more than 4,000 encode functional RNAs, leaving about 20,500 protein-encoding genes in the human genome [Humans Have Only 20,500 Protein-Encoding Genes].

Up to 40% of these protein-encoding genes are "unknown" in the sense that no function has been assigned to their protein products. In the jargon of genomics, the genes are "unannotated," meaning that nobody has assigned a function to the gene in the human genome database (Reichardt, 2007).

That means 8,000 unknown genes. About 1000 of these genes are "orphan" genes—genes that have no homologues in other species, including chimpanzees (Clamp, 2007).

Humans aren't unique. All sequenced eukaryotic genomes have a high percentage (~30-40%) of "unknown" protein-encoding genes.

A new paper in PLoS One looks at the "unknown" genes in the filamentous fungus Neurospora crassa (pink bread mold) (Kasuga et al. 2009). The Neurospora genome has about 9,000 protein-encoding genes and more than half of them have not been annotated. They are the "unkown" genes.

The genomes of about 40 different species of fungus have been sequenced and many of these are filamentous fungi related to Neuropsora. What this means is that it's possible to compare the Neurospora genes to those in many different genomes from closely related species; those that are part of the same family (less closelyrelated); part of the same phylum; and distantly related. You can't do such an extensive study with human genomes because there aren't very many mammalian genomes that have been sequenced and carefullyannotated. A draft sequence of the chimpanzee genome, for example, has been published but it is neither complete nor reliable enough for genomic comparisons. The only other primate genome is from macaque (Rhesus monkey) and that's far from finished. (The human and mouse genomes are the only ones listed as "complete" on the NCBI/Entrez website.)

The question is: are the unknown genes confined to Neurospora and its close relatives? If so, it would suggest that new genes have evolved within the past several million years and that's why we don't know their function.

Kasuga et al. created six sets of genes ...
  1. Genes with homologs in distantly related eukaryotes and possibly prokaryotes. These are ancient genes.
  2. Genes that are only found in fungi and not in plants or animals or protists (Dikarya).
  3. Genes found only in Ascomycetes.
  4. Genes confined to the Pezizomycotina clade to which Neurospora belongs.
  5. Genes found only in Neurospora.
  6. Others: genes that are found in some of the first groupings but not in all the smaller grouping.
The classification depends on the similarity cutoff. If the lowest cutoff is 25% sequence identity, then there will be more homologs in the eukarote or prokaryote class than if the cutoff is raised to 35%. The distibution of the various classes at each of three minimum sequence identify cutoffs is shown in their second figure.


Taking the 30% threshold numbers (middle group), it looks like there are 2,358 highly conserved genes with homologs in distantly related eukaryotes and prokaryotes. In contrast, there are 2,219 genes that don't have homologs in any other species. These are the orphan genes in Neurospora.

You might expect that most of the unknown/unannotated genes would be confined to Neurospora and closely related species. You might expect that highly conserved genes would be more likely to have been identified. That's partly true. Here are the numbers.


Only 16.5% of the highly conserved genes are mystery genes of unknown function. While this is much lower that the total (56%), it's still surprising that so many of the core genes remain unidentified. Presumably they are doing something very important. There are dozens of thesis projects available for talented graduate students who want to make a valuable contribution to biology.

It's not a surprise that 94% of the orphans are unannotated. These genes are likely to be new genes that have evolved recently in Neurospora and they would be expected to carry out unusual reactions that aren't found in other species. These "genes" are also the ones most likely to be artifacts (false positives) of the gene searching software. They may not be genes at all.


[Image Credit: Neurospora-National Institute of General Medical Sciences]

Clamp, M., Fry, B., Kamal, M., Xie, X., Cuff, J., Lin, M.F., Kellis, M., Lindblad-Toh, K. and Lander, E.S. (2007) Distinguishing protein-coding and noncoding genes in the human genome. Proc. Natl. Acad. Sci. (USA) 104:19428-19433. [DOI 10.1073/pnas.0709013104]

Kasuga, T., Mannhaupt, G., and Glass, N.L. (2009) Relationship between Phylogenetic Distribution and Genomic Features in Neurospora crassa. PLoS ONE 4(4):e5286. [DOI:10.1371/journal.pone.0005286]

Reichardt, J.K.V. (2007) Quo vadis, genoma? A call to pipettes for biochemists. Trends in Biochemical Sciences (TIBS) 32:529-530. [DOI:10.1016/j.tibs.2007.10.001]

Conservative Spin

 

Canadian Cynic has built a career out of keeping an eye on The Blogging Tories. Every now and then CC comes up with something that makes you scratch your head and ask, "Can The Blogging Tories really be that stupid?"

Here's a posting from ErwinGerrits.com that will answer the question.
Funny how the current deficit budget is now universally referred to as “The Conservative Deficit”, even after this current budget was forced upon us by the Liberals, NDPers and the bloc-heads after a mid-winter stand-off on the Governour General’s front stoop. As I recall, the Conservative’s Economic Update, brought forward in December, did not make us go into a deficit at all. It was after the three stooges reared their ugly heads and blackmailed the country, that the current deficit budget was tabled.


Monday, April 20, 2009

International team cracks mammalian gene control code

International team cracks mammalian gene control code

Stop the presses! Revise the textbooks! John Mattick and his collaborators have discovered how genes are controlled in mammals.

Anyone who knows Mattick's past history will know what's coming—Mattick overthrew the Central Dogma of Molecular Biology over six years ago (Mattick, 2003; Mattick, 2004).1,2
An international consortium of scientists, including researchers from The University of Queensland (UQ), have probed further into the human genome than ever before.

They have discovered how genes are controlled in mammals, as well as the tiniest genetic element ever found.

Their discoveries will be published in three milestone papers in leading journal Nature Genetics.


1. See Basic Concepts: The Central Dogma of Molecular Biology for the truth about the Central Dogma.

2. See Greg Laden Gets Suckered by John Mattick for an example of how easy it is to get fooled by John Mattick.

Mattick, J.S. (2003) Challenging the dogma: the hidden layer of non-protein-coding RNAs in complex organisms. BioEssays 25:930-939.

Mattick, J.S. (2004) The hidden genetic program of complex organisms. Sci. Am. 291:60-67.