10 - Concurrency in Python

Python provides three libraries to deal with concurrency.

threading and asynchio deals with IO bound concurrency type.
multiprocessing deals with multiprocess concurrency.

Introduction

The computer is made of parts that operate in different speeds. The processor executes millions of instructions in a second, but it may take a few miliseconds to fetch something in the disk or even more in the network.

The main strategy behind concurrency is to take advantage of this different speeds to coordinate processes such that the cpu is active the most of time. That is called time slicing and it is what gives the illusion of concurrency.

Time slicing strategies

No strategy: A single program at time.
Cooperative or shared: Processes send signal to the OS to tell it they can be passed to a wait state.
Preemptive: The OS decides whenever to put a process in a wait state or not.

The problem with cooperative or shared is that the OS blocks when the process crashes or start an execution of something very demanding on processing without caring about send signals to the system once in a while.

Producer - Worker - Consumer

This is the basic strategy for concurrency programs. The producer generates data to workers and the workers do the processing that will eventually produce an output to be sent to the consumer.

Think about the processing of a very large image.

Producer divides the image in several chunks.
Each chunk is processed independently by each of the workers.
Consumer receives all the processed chuncks and aggregates them.

This is the simplest example. You may have to deal with coordination between workers. For example, the workers may have to talk with each other to compute the processing on the boundaries of chunks.

Global Interpreter Lock (GIL)

Python can be extended using C. That means that some libraries implement code in C but they have a Python interface. Well, if you have a race condition during a memory allocation (somewhere in the C code) you may have problems. The GIL was introduced to avoid these type of problems.

That is bad because it limits concurrency in Python.

The `threading` library

import concurrent.futures
import requests
import threading

thread_local = threading.local()

def get_session():
    if not hasattr(thread_local, "session"):
        thread_local.session = requests.Session()

    return thread_local.session

def download_site(url):
    session = get_session()
    with session.get(url) as response:
        indicator = "J" if "jython" in url else "R"
        print(indicator, sep='', end='', flush=True)

def download_all_sites(sites):
    with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor:
        executor.map(download_sites, sites)

if __name__ == '__main__':
    sites = [
        "https://www.jython.org",
        "http://olympus.realpython.org/dice"
    ] * 80

    download_all_sites(sites)

The map method expects a list of lists. Each list contains the parameters to be passed to the function given in the first argument.

The library also offers threading manager primitives such as start and join, but due to GIL, it is better to stick with this way of use, that is, using the ThreadPoolExecutor.

In the internals, threading.local() makes that thread_local has a different instance at each thread. Using in conjunction with the ThreadPoolExecutor you don't need to care about thread orchestration.

The `asyncio` library

The threading library implements the premptive strategy to concurrency and it is operation system specific. The asyncio thread implements the cooperative strategy and it is independent of the OS.

Hint

Operation systems should use the preemptive strategy amongs its running processes to avoid the issues discusses earlier. But a program should use the cooperative strategy to implement concurrency amongs its threads (or their own spawned processes).

import asyncio
import time
import aiohttp

async def download_site(session, url):
    async with session.get(url) as response:
        indicator = "J" if "jython" in url else "R"
        print(indicator, sep='', end='', flush=True)

async def download_all_sites(sites):
    async with aiohttp.ClientSession() as session:
        tasks = []
        for url in sites:
            task = asyncio.ensure_future(download_site(session,url))
            tasks.append(task)

        await asyncio.gather(*tasks, return_exceptions=True)

if __name__ == '__main__':
    sites = [
        "https://www.jython.org",
        "http://olympus.realpython.org/dice"
    ] * 80

    loop = asyncio.get_event_loop()
    loop.run_until_complete(download_all_sites(sites))

The `dis` library

It refers to disassemble. This module allows us to see to which byte-code instructions our python code is transformed to. These byte-code instructions are executed by the Python virtual machine.

from dis import dis

def square(x):
    pass

dis(square)

Multiprocessing in Python

That means that each process will have

Their own Python runtime.
Their own stack.
Their own heap.
Their own set of byte-code.

That means that:

Duplication of addresses in the stack; duplication of storeed byte-code.
Interprocess communication is done by serializing and deserializing data.

Info

Multiprocessing scales in the sense that each new process can take advantage of available cores in the processor, but it does not solve the idle time problem (must processes spend their time waiting for some expensive task, e.g. io, to complete).

import multiprocessing
import time
import requests

session = None

def set_global_session():
    global session
    if not session:
        session = requests.Session()

def download_site(url):
    with session.get(url) as response:
        indicator = multiprocessing.current_process().name[-1]
        print(indicator, sep='', end='', flush=True)

def download_all_sites(sites):
    with multiprocessing.Pool(initializer=set_global_session) as pool:
        pool.map(download_site, sites)


if __name__ == '__main__':
    sites = [
        "https://www.jython.org",
        "http://olympus.realpython.org/dice"
    ] * 80

    download_all_sites(sites)

Info

Python offers mechanisms to share memory between processes. You can use Queue and Pipe for that.

The advantages of threads

Thread eliminate the duplication of the heap and byte-code storage. It also eliminates the need of serializing and deserializing data since all data is shared between threads.

In Python this is undermined by GIL. Only one thread can execute at any given time.

The threading library implements a premprive threading strategy, which is managed by the operational system.

CPU-bound x IO-bound workloads

Threads and the cooperative strategy of asyncio works fine when you have IO-bound workloads, that is, part of your processing relies on a task that it is executed by a peripheric that is order of magnitudes slower than the CPU (disks, networks...).

On the other hand, if all your workload is done on the CPU, than you are likely better of using multiprocessing instead. It is lilkely the case that using threads (or cooperative strategy) will slow down your program.

Danger

The statement above is only true for Pyhton because of the GIL stuff. Usually, the operating system will allocate each thread of a multithreaded process to a different CPU core and we can have truly simultaneous execution of threads of the same process. That explains why I my face-recognition algorithm in Python didn't speed up using threads, but magLac did.

Concurrency and Parallelism

The following is an extract from here:

In a multithreaded process on a single processor, the processor can switch execution resources between threads, resulting in concurrent execution. Concurrency indicates that more than one thread is making progress, but the threads are not actually running simultaneously. The switching between threads happens quickly enough that the threads might appear to run simultaneously.

In the same multithreaded process in a shared-memory multiprocessor environment, each thread in the process can run concurrently on a separate processor, resulting in parallel execution, which is true simultaneous execution. When the number of threads in a process is less than or equal to the number of processors available, the operating system's thread support system ensures that each thread runs on a different processor. For example, in a matrix multiplication that is programmed with four threads, and runs on a system that has two dual-core processors, each software thread can run simultaneously on the four processor cores to compute a row of the result at the same time.

Concurrency != Parallelism

Concurrency is a trick. It is the time slice done by processors and that gives the illusion that multiple tasks are being executed at the same time. Parallelism is one two tasks run simultaneously, for example, one in each core of your processor.

References

Exercise

What does the term coroutine really means?
Read an article explaining the GIL (Global Interpreter Lock) in Python.
Read an article about the diffent Python interpreters out there (for example, CPython)
How GIL limits Python concurrency capacities?
How come threading is OS-dependent and asyncio is not?

Answer: The concurrency implemented by the asyncio library use the cooperative multitasking strategy and it is managed by the Python interpreter instead of the OS. In the Python context, using asyncio is actually preferred than using threading because it carries less overhead than threads started by threading (which are undermined by GIL).

Watch What is a coroutine anyway?
Read PEP-342: Coroutines via Enhanced Generators
Read PEP-492: Coroutines with async and await
Understand how the async await mechanism fits in the cooperative strategy. When signaling occurs?
How this relates with coroutines in c++?
Read documentation on concurrent.futures.
Read stack overflow: How does asyncio really works.
Use asyncio with the Colins API interface.