Convert distance matrix to phylip format

I wrote a function to create a Eucliean distance matrix of some amino acid substitution matrices and I wanted to find a built-in method find the Spearman's rank of two lists to create a distance matrix that way. I found that BioPython actually has a method that builds distance matrices using various different distance metric, including Euclidean and Spearman's rank:

import Bio.Cluster
dm = Bio.Cluster.distancematrix(data, dist="s")

If you change the dist to "e", then it will calculate the Euclidean distance.

I thought there might be a way to output this in phylip format so I could use quicktree, but if there is, I wasn't able to find it. So here's mine:

fout = open(filename, 'w')
fout.write('%d\n' % len(names))
for name, row in zip(names, dm):
    for value in row:
       fout.write('\t%s' % value)

It assumes you have the distance matrix in the format created by the Bio.Cluster distancematrix function, and have a list of names for the sequences or matrices.

An example output would be:

B    1.2    0.8
C    3.2    1.6    2.0

The first value is the number of sequences in the distance matrix and the following lines are the lower triangle of a distance matrix, not including the diagonal (for which all the values would be 0).



Save my week!


Post new comment

The content of this field is kept private and will not be shown publicly.