Codon table


8 Dec 2010

Here's a quick code snippet to generate a codon table in Python. The 'table' is actually a dictionary that maps a three-letter, lowercase codon to a single uppercase letter corresponding to the encoded amino acid (or '*' if it's a stop codon).

  1. bases = "tcag"
  2. codons = [a + b + c for a in bases for b in bases for c in bases]
  3. amino_acids = 'FFLLSSSSYY**CC*WLLLLPPPPHHQQRRRRIIIMTTTTNNKKSSRRVVVVAAAADDEEGGGG'
  4. codon_table = dict(zip(codons, amino_acids))

So if you type codon_table['atg'], you'll get "M" for methionine. If you prefer to use 'u' rather than 't', simply change the base in the first line.

It's now quite easy to make a function to translate a gene into an amino acid sequence.

  1. def translate(seq):
  2. seq = seq.lower().replace('\n', '').replace(' ', '')
  3. peptide = ''
  4. for i in xrange(0, len(seq), 3):
  5. codon = seq[i: i+3]
  6. amino_acid = codon_table.get(codon, '*')
  7. if amino_acid != '*':
  8. peptide += amino_acid
  9. else:
  10. break
  11. return peptide

This function takes a DNA sequence, converts it to lowercase and removes any line breaks or spaces. Then it loops through it in chunks of 3, i.e. codons, translating them until it hits a stop codon or a codon not in the dictionary. It returns the amino acid sequence of the resulting peptide.

Comments (4)

Jay on 24 Sep 2011, 2:36 a.m.

Hi. Could you explain how the first part works to generate the codon table? It seems very useful and succinct but I just can't get my head round what's happening! How do the amino acids come to correspond with their respective codons in the dictionary? I understand zip works like this:

>>> x = [1, 2, 3]
>>> y = [4, 5, 6]
>>> zipped = zip(x, y)

Therefore I think it is probably this line:

codons = [a+b+c for a in bases for b in bases for c in bases]

That I don't understand

Thanks!

Peter on 24 Sep 2011, 3:04 p.m.

The line codons = [a+b+c for a in bases for b in bases for c in bases] is indeed the key line. It is a list comprehension, which I describe at: http://www.petercollingridge.co.uk/python-tricks/list-comprehensions

Basically it is the equivalent of writing:

codons = []
for a in bases:
for b in bases:
for c in bases:
codon.append(a+b+c)

Higa on 11 Dec 2013, 10:46 p.m.

This is a very simple and useful excerpt of code.

I was studing biology and wanted to make sure I understood RNA translation by making a program to compute it. Your code avoided the boring part (I was going to write all codons, one by one).

Thanks for sharing it.

Hilong on 6 Jul 2015, 5:29 p.m.

Another trick:

import itertools
codons = itertools.product('tcag', 'tcag', 'tcag')