Here's a quick code snippet to generate a codon table in Python. The 'table' is actually a dictionary that maps a three-letter, lowercase codon to a single uppercase letter corresponding to the encoded amino acid (or '*' if it's a stop codon).
bases = "tcag"
codons = [a + b + c for a in bases for b in bases for c in bases]
amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
codon_table = dict(zip(codons, amino_acids))
So if you type codon_table['atg'], you'll get "M" for methionine. If you prefer to use 'u' rather than 't', simply change the base in the first line.
It's now quite easy to make a function to translate a gene into an amino acid sequence.
def translate(seq):
seq = seq.lower().replace('\n', '').replace(' ', '')
peptide = ''
for i in xrange(0, len(seq), 3):
codon = seq[i: i+3]
amino_acid = codon_table.get(codon, '*')
if amino_acid != '*':
peptide += amino_acid
else:
break
return peptide
This function takes a DNA sequence, converts it to lowercase and removes any line breaks or spaces. Then it loops through it in chunks of 3, i.e. codons, translating them until it hits a stop codon or a codon not in the dictionary. It returns the amino acid sequence of the resulting peptide.
Comments (4)
Jay on 24 Sep 2011, 2:36 a.m.
Hi. Could you explain how the first part works to generate the codon table? It seems very useful and succinct but I just can't get my head round what's happening! How do the amino acids come to correspond with their respective codons in the dictionary? I understand zip works like this:
>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)
Therefore I think it is probably this line:
codons = [a+b+c for a in bases for b in bases for c in bases]
That I don't understand
Thanks!
Peter on 24 Sep 2011, 3:04 p.m.
The line codons = [a+b+c for a in bases for b in bases for c in bases] is indeed the key line. It is a list comprehension, which I describe at: http://www.petercollingridge.co.uk/python-tricks/list-comprehensions
Basically it is the equivalent of writing:
codons = []
for a in bases:
for b in bases:
for c in bases:
codon.append(a+b+c)
Higa on 11 Dec 2013, 10:46 p.m.
This is a very simple and useful excerpt of code.
I was studing biology and wanted to make sure I understood RNA translation by making a program to compute it. Your code avoided the boring part (I was going to write all codons, one by one).
Thanks for sharing it.
Hilong on 6 Jul 2015, 5:29 p.m.
Another trick:
import itertools
codons = itertools.product('tcag', 'tcag', 'tcag')