Thursday, 31st May 2012
I'm very excited that my paper (cowritten with Steve Kelly) on MergeAlign has finally come out in BMC Bioinformatics. We managed to time things well, so the website we've created - mergealign.com - is also up and fully functional.
MergeAlign is a program that combines many different sequence alignments of the same set of sequences into a single consensus alignment. It works by looking at the frequency at which column appears in all the different alignments and optimally combining them using a dynamic programming approach similar to that used to align two sequences. I've described the process in more detail with a fancy animated HTML5 Canvas here.
On the website we have now added a place to upload a single set of sequences and we align them using 91 selected matrices. We then use MergeAlign to combine these into a single consensus alignment. The advantage of using MergeAlign (other than the alignments it produces being more accurate, on average, than on any of the alignments you put into it), is that we can generate a percentage support for every column in the final alignment, showing how many of the constituent alignment agree that the residues in that column are aligned. This is particularly useful if you're going to use the alignment in a tree and wish to only use columns you have a lot of confidence in. I created an output page which hopefully makes selected a sensible threshold very simple and gives you an immediate feel for which regions of an alignment have the most support.