Codementor Events

Python Quick Tip: Simple ThreadPool Parallelism

Published Oct 28, 2015Last updated Feb 09, 2017
Python Quick Tip: Simple ThreadPool Parallelism

Parallelism isn't always easy, but by breaking our code down into a form that can be applied over a map, we can easily adjust it to be run in parallel!

A map is a built-in higher-order function that applies a given function to each element of a list, returning a list of results.

The multiprocessing library is usually used for separate processes, however it has a neat dummy module that works over threads. In fact, it's so trivial that you only need to set the number of threads and give it your function to be mapped over. This method does all of the hard work for you.

Don't believe me? Check it out for yourself, let's square a bunch of numbers in parallel:

Code Example

from multiprocessing.dummy import Pool as ThreadPool

def squareNumber(n):
    return n ** 2

# function to be mapped over
def calculateParallel(numbers, threads=2):
    pool = ThreadPool(threads)
    results = pool.map(squareNumber, numbers)
    pool.close()
    pool.join()
    return results

if __name__ == "__main__":
    numbers = [1, 2, 3, 4, 5]
    squaredNumbers = calculateParallel(numbers, 4)
    for n in squaredNumbers:
        print(n)
    

The number of threads will usually be equivalent to the number of cores you have. If you have hyperthreading on your processor, than you will be able to double the number of threads used.

The results returned will simply be a standard list of all of the squared numbers in this example. Too easy, right?

Discover and read more posts from Lance Pioch
get started
post commentsBe the first to share your opinion
Mike
6 years ago

When we write “import multiprocessing” - we get no error. When we write your line, “from multiprocessing.dummy…” - we get an error “no module named multiprocessing.” What’s the trick to get your import to work?

Lance Pioch
6 years ago

Hey Mike, I get no errors when importing that on both PY 2.7.10 and 3.4.2. Some quick googling pointed out that you’re most likely missing some files. Other users were able to fix it by either adding the modules in manually and others by reinstalling Python. Thanks.

Vivek Kumar Singh
7 years ago

Shouldn’t this be pool.join() and then pool.close()

Yifei Kong
7 years ago

no, it shouldn’t

ahavic
7 years ago

It is worth mentioning that the example code wont execute in ‘parallel’ since it is CPU bound task.
As a matter of fact it wont execute in parallel no matter if task is CPU or IO bound, as long as threads are being used.

Mike
6 years ago

Does the multiprocessing module specifically bypass the GIL thus preventing it from being CPU bound? (as opposed to the threading module which doesn’t)

Show more replies