Reed A. Cartwright (De Rerum Natura) has just posted a summary of his recently published paper on the effect of gap costs in sequence alignment [Logarithmic gap costs decrease alignment accuracy].
It sounds esoteric but, in fact, it's a very important problem. Computer driven sequence alignments are behind a great deal of the bioinformatics that's being published today. Surprisingly, no computer program can do as good a job at global sequence alignment as a competent student. This should be cause for concern since it means that all the published work is known to be sub-optimal because the algorithms aren't up to the task. Most workers don't acknowledge this—I suspect they simply don't realize that the alignment programs are inefficient.
Reed looked at a particular problem in sequence alignment. The only difficult part about sequence alignment is placing the gaps that are due to insertions and deletions (indels) arising from the time that two sequences diverged from a common ancestor. During automated sequence alignment the program has to assign a penalty, or cost, for inserting gaps in the alignment. If there was no penalty associated with indels then the program would insert gaps willy-nilly to bring every position into perfect alignment. The idea is to limit the placement of gaps to only those locations where they truly represent an evolutionary event.
The standard penalty is represented by the formula Gk a + bk where Gk is the gap penalty. There are two components to the penalty: "a" is the penalty for creating a gap, and "b" is the penalty for extending it by "k" residues.
Reed tested several other types of gap penalties to see if they did a better job at aligning sequences. You should read his posting to see the surprising result. His paper is available here.
Here's an example of a computer generated multiple sequence alignment from the Pfam database [HSP70 alignments]. The protein is HSP70, the major protein chaperone. If you look at the right-hand side of the first page you can see how the algorithm placed the gaps (represented by dots). Most of you coud do a better job with just a little practice.
No comments:
Post a Comment