Skip to content

11 - Functional programming in Python

Starting point

Scientists = collections.namedtuple('Scientist', [
    'name',
    'field',
    'born',
    'nobel'
])

scientists = (
    Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
)

Mutable and Immutable data structure

Lists and dictionaries are mutable. You have to be extra careful with mutable data in multithreaded processes, for example. Immutable structures are more reliable in the sense that their state does not change after it is defined.

Hint

A NamedTuple is a immutable data structure that offers you the practicity of a class constructor with keyword arguments.

Filter, map and reduce

filter(function, iterable)
map(function, iterable, *iterables)
reduce(function, iterable[, initializer])

The defaultdict class

This one is particularly useful with the reduce function. The defaultdict allows you to tell how to initialize the value hold by a key that does not belong to the dictionary yet.

def reducer(acc, val):
    acc[val.field] = val.name
    return acc

import collections
scientists_by_field = reduce(
    reducer,
    scientists,
    collections.defaultdict(list)
)
print(scientists_by_field)
{
    'astronomy': ['Vera Rubin'],
    'chemistry': ['Tu Youyou', 'Ada Yonath']
}

The itertools.groupby helper function

scientists_by_field = {
 item[0]: list(item[1]) # Because item[1] is a generator
 for item in itertools.groupby(scientists, lambda x: x.field)
}

Functional constructs in multiprocessing library

import multiprocessing

def transform(x):
    pass

pool = multiprocessing.Pool()
result = pool.map(transform, scientists)

Functional constructs in concurrent.futures library

You have the equivalent constructs for preemptive concurrency with threads (although you can also use processes). It is way lighter than using processes.

The nice thing about the concurrent.futures module is that it implements different concurrency strategies with similar constructions. For example, it is straightforward to change from multiprocess to a multithread strategy.

import concurrent.futures

def transform(x):
    pass

with concurrent.futures.ProcessPoolExecutor() as executor:
    result = executor.map(transform, scientists)
ThreadPoolExecutor

Instead of a ProcessPoolExecutor you can use a ThreadPoolExecutor. Threads are less expensive to create than processes. In the course example, the ThreadPoolExecutor runs in 1 second while the ProcessPoolExecutor runs in 2 seconds.

IO-bound x Computation-bound

In scenarios in which the tasks being computed are computationaly intesive, ProcessPoolExecutor would be a better choice in Python because of GIL.

Exercises

  • Create a minimal example in which it is easier to achieve concurrency with immutable data + functional programming than with mutable data + object-oriented or procedural programming.
  • Code an example in which you use the reduce clause together with the defaultdict construction.
  • Code the same example you did above but using itertools.groupby
  • Code the multiprocessing pool.map to parallelize the transformation of some list and compare with the thread implementation. Assume that it is a IO-bound operation (you can add some sleep to the transform function).