Friday, May 25, 2012

The Importance of the Null Hypothesis

Jonathan Eisen of The Tree of Life is hosting a series of guest postings by the authors of recetnly published papers. The latest is a guest post by Josh Weitz on their paper on BMC Genomics: A neutral theory of genome evolution and the frequency distribution of genes. The paper tries to explain the concept of a pan genome, where closely related species, or strains, each have a subset of the total number of genes in the entire collection of species/strains. Why do some strains and some genes and not others?

Josh Weitz makes a point that bears repeating because most people just don't understand it.
So, let me be clear: I do think that genes matter to the fitness of an organism and that if you delete/replace certain genes you will find this can have mild to severe to lethal costs (and occasional benefits). However, our point in developing this model was to try and create a baseline null model, in the spirit of neutral theories of population genetics, that would be able to reproduce as much of the data with as few parameters as possible. Doing so would then help identify what features of gene compositional variation could be used as a means to identify the signatures of adaptation and selection. Perhaps this point does not even need to be stated, but obviously not everyone sees it the same way. In fact, Eugene Koonin has made a similar argument in his nice paper, Are there laws of adaptive evolution: "the null hypothesis is that any observed pattern is first assumed to be the result of non-selective, stochastic processes, and only once this assumption is falsified, should one start to explore adaptive scenarios''. I really like this quote, even if I don't always follow this rule (perhaps I should). It's just so tempting to explore adaptive scenarios first, but it doesn't make it right.


No comments:

Post a Comment