Finding the reverse complement

I wrote this small function to get the reverse complement of a DNA sequence. It's probably not the most efficient way to do it, but I like it because it's compact. It will work for upper- or lower-case sequences, but it will always return a lowercase sequence. It removes any characters it doesn't recognise, which is useful if you have line numbers in the sequence.

def reverseComplement(sequence):
  complement = {'a':'t','c':'g','g':'c','t':'a','n':'n'}
  return "".join([complement.get(nt.lower(), '') for nt in sequence[::-1]])

This function works by going through a string (or list) backwards and replacing each letter using a dictionary. Any character not in the dictionary, such as spaces, line breaks or numbers, are ignored so a single continuous string is returned.

Another option is to use the string.translate() function:

import string
complement = string.maketrans('atcgn', 'tagcn')

def reverseComplement(sequence):
    return sequence.lower().translate(complement)[::-1]

This function works by creating a translation table with string.maketrans() and using it to translate the sequence.

Post new comment

The content of this field is kept private and will not be shown publicly.