Python mul­ti­pro­cess­ing lets you divide the workload among multiple processes, cutting down on overall execution time. This is es­pe­cial­ly useful for making hefty cal­cu­la­tions or handling large datasets.

What is Python mul­ti­pro­cess­ing?

Mul­ti­pro­cess­ing in Python refers to running multiple processes si­mul­ta­ne­ous­ly, allowing you to make the most of multicore systems. Unlike single-threaded methods that handle tasks one by one, mul­ti­pro­cess­ing lets various parts of the program run in parallel, each on its own. Each process gets its own memory space and can run on separate processor cores, slashing execution time for heavy-duty or time-sensitive op­er­a­tions.

Python mul­ti­pro­cess­ing has a wide range of ap­pli­ca­tions. Mul­ti­pro­cess­ing is often used in data pro­cess­ing and analysis to process large data sets faster and to ac­cel­er­ate complex analyses. Mul­ti­pro­cess­ing can also be used in sim­u­la­tions and modeling cal­cu­la­tions (e.g., in sci­en­tif­ic ap­pli­ca­tions) to shorten the execution times of complex cal­cu­la­tions. In addition to powering web scraping by fetching data from multiple sites si­mul­ta­ne­ous­ly, it also boosts ef­fi­cien­cy in image pro­cess­ing and computer vision, resulting in quicker image analysis.

HiDrive Cloud Storage
Store and share your data on the go
  • Store, share, and edit data easily
  • Backed up and highly secure
  • Sync with all devices

How to implement Python mul­ti­pro­cess­ing

Python offers various options for im­ple­ment­ing mul­ti­pro­cess­ing. In the following sections, we’ll introduce you to three common tools: the multiprocessing module, the concurrent.futures library and the joblib package.

multiprocessing module

The multiprocessing module is the standard module for Python mul­ti­pro­cess­ing. With this module, you can create processes, share data between them and sync them using locks, queues and other tools.

import multiprocessing
def task(n):
    result = n * n
    print(f"Result: {result}")
if __name__ == "__main__":
    processes = []
    for i in range(1, 6):
        process = multiprocessing.Process(target=task, args=(i,))
        processes.append(process)
        process.start()
    for process in processes:
        process.join()
python

In the example above, we use the multiprocessing.Process class to spawn and run processes executing the task() function, which computes the square of a given number. After ini­tial­iz­ing the processes, we wait for them to complete before pro­ceed­ing with the main program. The result is displayed using an f-string, a Python string format method that in­cor­po­rates ex­pres­sions. It’s worth noting that the sequence of output is random and non-de­ter­min­is­tic. You can also create a process pool with Python multiprocessing:

import multiprocessing
def task(n):
    return n * n
if __name__ == "__main__":
    with multiprocessing.Pool() as pool:
        results = pool.map(task, range(1, 6))
        print(results)  # Output: [1, 4, 9, 16, 25]
python

With pool.map() the function task() is applied to a sequence of data, and the results are collected and output.

concurrent.futures library

This module provides a high-level interface for asyn­chro­nous execution and parallel pro­cess­ing of processes. It uses the Pool Executor to execute tasks on a pool of processes or threads. The concurrent.futures module is a simpler way to process asyn­chro­nous tasks and is in many cases easier to handle than the Python multiprocessing module.

import concurrent.futures
def task(n):
    return n * n
with concurrent.futures.ProcessPoolExecutor() as executor:
    futures = [executor.submit(task, i) for i in range(1, 6)]
    for future in concurrent.futures.as_completed(futures):
        print(future.result()) # result in random order
python

The code uses the concurrent.futures module to process tasks in parallel with the ProcessPoolExecutor. The function task(n) is passed for numbers from 1 to 5. The as_completed() method waits for the tasks to be completed and outputs the results in any order.

joblib package

joblib is an external Python library designed to simplify parallel pro­cess­ing in Python, for example, for re­peat­able tasks such as executing functions with different input pa­ra­me­ters or working with large amounts of data. The main functions of joblib is the par­al­leliza­tion of tasks, the caching of function results and the op­ti­miza­tion of memory and computing resources.

from joblib import Parallel, delayed
def task(n):
    return n * n
results = Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11))
print(results) # Output: Results of the function for numbers from 1 to 10
python

The ex­pres­sion Parallel(n_jobs=4)(delayed(task)(i) for i in range(1, 11)) initiates the parallel execution of the function task() for the numbers from 1 to 10. Parallel is con­fig­ured with n_jobs=4, meaning up to four parallel jobs can be processed. Calling delayed(task)(i) creates the task to be executed in parallel for each number i in the range from 1 to 10. This means that the task() function is called si­mul­ta­ne­ous­ly for each of these numbers. The results for numbers 1 through 10 are stored in results and output.

Go to Main Menu