11 - Functional programming in Python
Starting point
Scientists = collections.namedtuple('Scientist', [
'name',
'field',
'born',
'nobel'
])
scientists = (
Scientist(name='Ada Lovelace', field='math', born=1815, nobel=False),
)
Mutable and Immutable data structure
Lists and dictionaries are mutable. You have to be extra careful with mutable data in multithreaded processes, for example. Immutable structures are more reliable in the sense that their state does not change after it is defined.
Hint
A NamedTuple is a immutable data structure that offers you the practicity of a class constructor with keyword arguments.
Filter, map and reduce
filter(function, iterable)
map(function, iterable, *iterables)
reduce(function, iterable[, initializer])
The defaultdict class
This one is particularly useful with the reduce function. The defaultdict allows you to tell how to initialize the value hold by a key that does not belong to the dictionary yet.
def reducer(acc, val):
acc[val.field] = val.name
return acc
import collections
scientists_by_field = reduce(
reducer,
scientists,
collections.defaultdict(list)
)
print(scientists_by_field)
{
'astronomy': ['Vera Rubin'],
'chemistry': ['Tu Youyou', 'Ada Yonath']
}
The itertools.groupby helper function
scientists_by_field = {
item[0]: list(item[1]) # Because item[1] is a generator
for item in itertools.groupby(scientists, lambda x: x.field)
}
Functional constructs in multiprocessing library
import multiprocessing
def transform(x):
pass
pool = multiprocessing.Pool()
result = pool.map(transform, scientists)
Functional constructs in concurrent.futures library
You have the equivalent constructs for preemptive concurrency with threads (although you can also use processes). It is way lighter than using processes.
The nice thing about the concurrent.futures module is that it implements different concurrency strategies with similar constructions. For example, it is straightforward to change from multiprocess to a multithread strategy.
import concurrent.futures
def transform(x):
pass
with concurrent.futures.ProcessPoolExecutor() as executor:
result = executor.map(transform, scientists)
ThreadPoolExecutor
Instead of a ProcessPoolExecutor you can use a ThreadPoolExecutor. Threads are less expensive to create than processes. In the course example, the ThreadPoolExecutor runs in 1 second while the ProcessPoolExecutor runs in 2 seconds.
IO-bound x Computation-bound
In scenarios in which the tasks being computed are computationaly intesive, ProcessPoolExecutor would be a better choice in Python because of GIL.
Exercises
- Create a minimal example in which it is easier to achieve concurrency with immutable data + functional programming than with mutable data + object-oriented or procedural programming.
- Code an example in which you use the reduce clause together with the defaultdict construction.
- Code the same example you did above but using
itertools.groupby - Code the multiprocessing
pool.mapto parallelize the transformation of some list and compare with the thread implementation. Assume that it is a IO-bound operation (you can add some sleep to the transform function).