Python Tricks

This group of pages do not really describe a programming project, but is a convenient place to record various Python tricks I've learnt. I hope they might also prove useful to other people.

I've called these code snippets tricks, but they are really just handy Python features used in simple, but effective ways. They are methods that since learning, I have used frequently and feel have improved the way I code.

There is a good list of so-called "hidden features" in Python on Stack Overload: http://stackoverflow.com/questions/101268/hidden-features-of-python/1024693

Boolean indices

In Python, not only is 0 False and 1 True, but True is 1 and False is 0. This means you can do weird things like:

>>> a = 1
>>> (a==1) + (a>0) + (a==2)
2

I'm not sure why you'd want to do this unless you wanted to count how many conditions had been met. More usefully, you can use a boolean test as an index for an array or tuple. For example, rather than write:

if a % 2 == 0:
    print "a is even"
else:
    print "a is odd"

You can write:

print ("a is odd", "a is even")[a % 2 == 0]

Admittedly, this is probably less readable.

Examples

An example of when I've found this trick useful is when I wanted to create a play/pause button. In response to a keystroke, I wanted flip the value of a boolean variable call 'paused': if it was currently True then it should becomes False, if it were False then it should become True. This can be achieve like so:

paused = (True, False)[paused]

Another situation in which using a boolean test as a index might be useful is when you don't have the luxury of writing multiple lines of code, e.g. within a lambda function or list comprehension. For example:

>>> my_list = [1, 7, 11, 8, 13, 2]
>>> [("odd","even")[i % 2 == 0] for i in my_list]
["odd","odd","odd","even","odd","even"]

A more useful example would be to threshold a list of data:

thresholded_data = [(0,1)[i > threshold] for i my_list]

But in that situation, it's easier to just coerce the test into an integer:

thresholded_data = [int(i > threshold) for i my_list]

Dictionary.get()

Like enumerate, I'm not sure that this can be rightly called a trick since it's just a built-in function, but it's one that I wasn't aware of until long after I first needed it. Using a dictionary's get() function allows you to automatically check whether a key is in a dictionary and return a default value if it isn't.

For example:

>>> numbers = {1: 'one', 2:'two', 3:'three'}
>>> print numbers.get(1, 'Number not defined')
one
>>> print numbers.get(4, 'Number not defined')
Number not defined

This is useful in all sorts of situations. One common situation in which I find it useful is when you want to get counts of the numbers of items in a group of items, for example, if you want to create a histogram. For example, to count the frequency of letters in a string:

my_string = "I want to get the counts for each letter in this sentence"
counts = {}

for letter in my_string:
    counts[letter] = counts.get(letter, 0) + 1
print counts

Here, for each letter in my_string, you are getting the number of counts for that letter; if that letter hasn't yet been added to the counts dictionary, then the 0 is returned.

Enumerate

This is probably stretching the definition of 'trick', given that it's simply using a built-in Python function, but it's a very handy function that I didn't know about for some time, whilst wishing there was something exactly like it.

Python is great when it come to traversing lists with for loops, but sometimes you want to know where in the list you are. I used to write code something like:

for n in range(len(my_list)):
    print n, my_list[n]

But it's much cleaner (and computationally more efficient, I believe):

for n, item in enumerate(my_list):
    print n, item

This function is particularly useful if you want to compare every item in a list to every other item in the list. For example, in my particle simulation, I wanted to test whether any particle in a list of my_particles overlapped with any other. The following code calls the collide function (which checks whether two particles overlap), with each pair of particles in the list.

for i, particle1 in enumerate(my_particles):
    for particle2 in my_particles[i+1:]:
        collide(particle1, particle2)

A further trick

A further 'trick' with enumerate is to pass a second parameter, which define what number to start counting from. For example:

a = ['two', 'three', 'four']
for i, word in enumerate(a, 2):
  print i, word

Will print:

2 two
3 three
4 four

Therefore the code for particle collisions above can be made a bit cleaner:

for i, particle1 in enumerate(my_particles, 1):
    for particle2 in my_particles[i:]:
        collide(particle1, particle2)

List Comprehensions

I had known about Python list comprehensions for a while, but had avoided them, thinking them too complicated (to write and to read) and that I could achieve whatever list comprehensions achieve without them (which is true). However, now I have got the hang of using them, I find them the one of the most useful techniques in Python. I sometimes find myself replace quite long loops or whole functions with a single line list comprehension. I have even attempted to reduce Conway's Game of Life to a single line of Python using list comprehensions.

You can find good guides to list comprehensions elsewhere on the internet, but briefly, they generate a list using one or more for loops with optional conditions. For example, if you want a list of the first five square numbers: [1, 4, 9, 16, 24], you could use a for loop.

squares = []
for n in range(1,6):
    square.append(n**2)

But you can do in a single line with a list comprehension:

[n**2 for n in range(1,6)]

A real-life example of when I used a list comprehension was to built a list of all the codons (triplets of nucleotide bases):

bases = ['U', 'C', 'A', 'G']
codons = [a+b+c for a in bases for b in bases for c in bases]

List comprehensions can be used to construct lists of arbitrary complexity and it can be very tempting once you get the hang of them, though I'm not sure it makes for very readable code. Below is some code I wrote to compress a list such that the first item in my new list was the average of the first ten items in the original list, the second was the average of the next ten items and so on. It's written such that I can change the 'bin size' from ten to whatever I want.

data = [10, 13, 15 ...]  # Several thousand numbers
bin = 10
compressed_data = [float(sum(data[n*bin:(n+1)*bin]))/bin for n in range(len(data)/bin)]

Whether this is particularly readable is debatable.

 

Setting decimal places

It is relatively easy to output a string to a specific number of decimal places using string formatting:

> "%.3f" % 3.1415927
> '3.142'

If you want to round to a variable number of decimal places you can use string formatting twice:

def roundTo(n, dp):
    s = "%%.%df" % dp
    return s % n

print roundTo(3.1415927, 3)

This first generates a string using dp to define the number of decimal places (e.g. if dp is 3, then the string is "%.3f"). Note that %% is the code for %. Then it uses this string to round the number n as before.

The more recent string format function works like this:

"{0:.3f}".format(n)

And that allows you to create roundTo function like this:

{0:.{1}f}".format(n, dp)

It's not an easy-to-read expression. The 1 is replaced by the second value (using 0-indexing), then the 0 is replaced by the first value using the ".3f" method.

Removing unnecessary decimal places

When making my SVG optimiser, I wanted to reduce the size of SVG by reducing decimals (paths were often given to six decimal places, when one is sufficient for most purposes). When testing using the method above, I found that it could introduce unneeded decimal places since all integer would be given have ".0" appended, tripling their size.

My solution, and there maybe better ones (such as using the decimal module), was to use regex. I first defined a pattern to find trailing zeros, capturing the zeros in one group and any non-zero numbers after the decimal place in a second.

import regex as re
re_zeros = re.compile('\.(\d*?)(0+)$')

Then given a string that has been formatting, it is searched for trailing zeros. The string is truncated by the number of trailing zeros, and if there aren't any other digits after the decimal place, it is truncated by a further place to remove the decimal place.

z = re_zeros.search(str_n)
if z:
  length = (len(z.group(2)) + (len(z.group(1)) == 0))
  str_n = str_n[:-length]

Sorting objects

Sorting lists in Python is very simple (list.sort()), but I often need to sort a list of objects based on the one of the objects' attributes. I tried various messy, hacky methods before finding a simple method: passing a new comparison function for sort to use.

Say you have a list of objects, each of which has an attribute called 'score'. You can sort the list by object score like so:

my_list.sort(key = lambda x: x.score)

This passes a lambda function to sort, which tells it to compare the score attributes of the objects. Otherwise, the sort function works exactly as normal (so will, for example, order strings alphabetically.

You can also use this technique to sort a dictionary by its values:

sorted_keys = sorted(my_dict.keys(), key=lambda x: my_dict[x])
for k in sorted_keys:
    print my_dict[k]

The code creates a list of the dictionary keys, which it sorts based on the value for each key (note that you can't simply sort my_dict.keys()). Alternatively you can loop through the keys and values in one go:

for k, v in sorted(my_dict.items(), key=lambda (k,v): v):
    print k, v


Zipping and rezipping lists

I think a common mistake when coming to Python from some other programming languages is to loop through a list like this:

for i in range(len(my_list)):
    print my_list[i]

Rather than:

for x in my_list:
    print x

Whilst I knew not to do this, when trying to loop through two lists at the same time, I found myself resorting to this:

for i, x in enumerate(list1):
    print x, list2[i]

A much better solution is to zip the two lists together like this:

for (i, j) in zip(list1, list2):
    print i, j

A syntax that confused me for some time was zip(*my_list):

z = zip(list1, list2)
newlist1, newlist2 = zip(*z)

This works becauses the * syntax unpacks a list of values. The above code zips and unzips two lists, which is pointless, but the same syntax can be used to convert from a list of columns of data to a list of rows of data. For example, the following list comprehension reads in a file of tab-delimited data as a list of rows, where each row is a tuple of values:

rows = [line.rstrip().split('\t') for line in file(filename)]

If you want to flip the data through 90 degrees (i.e. convert from rows or data to columns of data), then you use:

columns = zip(*rows)

For example, if the data was originally (a, 1), (b, 2), (c, 3), it becomes (a, b, c), (1, 2, 3).

And + or

In Python, and and or work in a slightly unusual way, which means they can be used to assign values.

Rather than write:

if n < 0:
    result = 'n is negative'
else:
    result = 'n is positive'

You can write:

result = n < 0 and 'n is negative' or 'n is positive'

The result is shorter, though I'm not sure it's more readable. However, the fact that it's a single line makes it more versatile. For example, you can include it in a list comprehension, as I did in this contrived example, or you can pass it as an argument to a function.

Then general form is:

result = test and true_result or false_result

The logic is that the two results count as being true, so if the test is true, then test and true_result is true; if the test is false then test or false_result is true.

There is a more detailed explanation of why this trick works at Dive Into Python.