[This article is very old now, and probably very out of date. Hopefully the principles are basically sound.]
I've recently got a multi-core laptop, so was keen to try some parallel processing using Python. It's pretty simple; you just need to use:
os.fork()
However, the difficult part is working out what happens after the fork, and working out how to build a program around it. The difficulty is that when the program reaches the `os.fork()` command, the program splits into two identical copies. But generally you don't want two copies a program doing exactly the same thing - you want two programs doing slightly different things. Even trying to create differences using random numbers is problematic.
Differentiating between processes
Naturally, there is a way to differentiate between the parent and child processes: when the `os.fork()` is called it returns 0 to the child process and id of the child process to the parent.
import os
pid = os.fork()
print pid
As a result, it's possible to make the parent and child processes do different things. For example, the following will write two different files with different outputs:
import os
pid = os.fork()
if pid == 0:
fout = open('child.txt', 'w')
fout.write('File created by child process %d' % pid)
else:
fout = open('parent.txt', 'w')
fout.write('File created by parent process %d' % pid)
fout.write('\nEnd of file')
Waiting for a child process
If you've created a child process, the chances are you want the parent to wait for it to finish doing whatever its doing before the parent continues. For this you need to use `os.waitpid(pid, 0)`. For example:
import os, time
def timeConsumingFunction():
x = 1
for n in xrange(10000000):
x += 1
pid = os.fork()
if pid > 0:
child = pid
else:
timeConsumingFunction()
os._exit(0)
t = time.time()
os.waitpid(child, 0)
print time.time() - start_time
Here, the parent process splits of a child which counts to ten million, while the parent waits. Once the child has finished calling the `timeConsumingFunction`, it exists with `os._exit(0)`. Note that `os._exit(0)` is used for child processes instead of `os.exit(0)`. The 0 indicates that the process has exited without errors. Once the child has finished, the parent prints the time it spent waiting for the child.
Multiple forks
To create multiple forks, we can use a loop. In this case, using `os._exit(0)` is vital to ensure that the child processes don't continue the loop, forking off even more children.
import os, time
NUM_PROCESSES = 7
def timeConsumingFunction():
x = 1
for n in xrange(10000000):
x += 1
children = []
start_time = time.time()
for process in range(NUM_PROCESSES):
pid = os.fork()
if pid:
children.append(pid)
else:
timeConsumingFunction()
os._exit(0)
for i, child in enumerate(children):
os.waitpid(child, 0)
print time.time() - start_time
Comments (9)
Rajaseelan on 1 Feb 2012, 2:33 p.m.
Thanks dude. I'm still using Python 2.4 due to Centos 5 and OS restrictions. :(
This is better than trying to install the backported processing module, making the script more portable :)
d on 30 Jun 2013, 9:08 p.m.
Awesome explanation man! Thanks!
bookworm on 4 Jul 2013, 9:34 p.m.
This is somewhat incorrect :
pid = os.fork()
if pid > 0:
fout = open('child.txt', 'w')
fout.write('File created by child process %d' % pid)
else:
fout = open('parent.txt', 'w')
fout.write('File created by parent process')
from the docs its the other way around,
"Return 0 in the child and the child’s process id in the parent."
sorry to make a fuss about 3 yer old code :)
evaristoc on 24 May 2014, 6:13 p.m.
Thanks anyway to all!!! Old, but there are SOOO many things to learn...!! Pufff...
Andreas on 7 Nov 2015, 2:32 a.m.
Thanks Peter,
your article helped me a lot in solving a problem in an efficient way!
Best regards,
Andreas
Anonymous on 30 Apr 2016, 10:50 p.m.
Thanks for example.
Probably this line: print time.time() - t
should be corrected: print time.time() - start_time
DINESH PRADHAN on 8 May 2016, 3:20 a.m.
Love your work !!
Anonymous on 15 Feb 2017, 11:19 a.m.
A few years later but I come with a question:)
How do you limit for example that you have maximum 5 processes running a task at a time?
If you waitpid on the process, you block the main thread execution.
Is there any way to check in a loop the number of running forked processes and if number < "predefined number", then fork a new one and if not just sleep again for a while?
Heikki on 4 Jun 2018, 11:31 a.m.
You have forgot to define start_time variable to your first example.