samwellwang

samwellwang

coder
twitter

python multithreading

In Python, multithreading and multiprocessing are commonly used concurrent programming methods. They can both improve the execution efficiency of programs, but their implementation principles are quite different. This article will delve into the differences between Python multithreading and multiprocessing, and explain why multithreading is called "fake multithreading." It was also when I first started learning Python that I discovered the problem of lower efficiency of multithreading compared to single-threading when processing data.

Differences between Multithreading and Multiprocessing#

Multithreading and multiprocessing are both ways of concurrent programming, but their implementation methods are quite different. Multiprocessing refers to running multiple processes simultaneously in the operating system, each process having its own independent address space and system resources. Multithreading refers to running multiple threads simultaneously in the same process, with each thread sharing the same address space and system resources.

Specifically, multiprocessing achieves concurrency by creating new processes. Each process has its own independent address space and system resources, and inter-process communication (IPC) mechanisms are used for communication between processes. Multithreading achieves concurrency by creating new threads. All threads share the same address space and system resources, and communication between threads is done through shared memory and synchronization mechanisms.

Due to the fact that multiprocessing is implemented at the operating system level, and each process has its own independent address space and system resources, multiprocessing has better stability and security. However, creating and destroying processes consumes a large amount of system resources, so multiprocessing is suitable for CPU-intensive tasks. On the other hand, multithreading is suitable for IO-intensive tasks because the cost of switching between threads is much lower than the cost of switching between processes.

Why Multithreading is "Fake Multithreading"#

In Python, due to the existence of the Global Interpreter Lock (GIL) mechanism, multithreading cannot truly achieve concurrency. The GIL is a lock in the Python interpreter that ensures that only one thread can execute Python bytecode at a time. This means that at any given time, only one thread can truly execute Python code, even on multi-core CPUs.

Due to the existence of the GIL mechanism, multithreading in Python is called "fake multithreading." Although multiple threads can coexist in memory, they cannot truly execute Python code concurrently. Therefore, using multithreading in Python does not improve the execution efficiency of CPU-intensive tasks.

However, for IO-intensive tasks, multithreading in Python still has advantages. This is because in IO-intensive tasks, most of the time is spent waiting for IO operations to complete. In this case, the GIL mechanism does not have a significant impact on the program's execution efficiency.

Ways to Start Multithreading and Multiprocessing#

In Python, there are multiple ways to start multithreading or multiprocessing. Below, we will introduce their advantages and disadvantages.

Ways to Start Multithreading#

1. Using the threading module#

import threading

def worker():
    print("I'm working")

t = threading.Thread(target=worker)
t.start()

Creating a new thread using the threading module is very simple. Just create a Thread object and pass the function to be executed as a parameter. However, due to the existence of the GIL mechanism, multithreading cannot truly achieve concurrent execution.

2. Using the concurrent.futures module#

from concurrent.futures import ThreadPoolExecutor

def worker():
    print("I'm working")

with ThreadPoolExecutor() as executor:
    executor.submit(worker)

Using the concurrent.futures module, you can create new threads more conveniently and use the ThreadPoolExecutor class to manage the thread pool. However, due to the existence of the GIL mechanism, multithreading cannot truly achieve concurrent execution.

Ways to Start Multiprocessing#

1. Using the multiprocessing module#

import multiprocessing

def worker():
    print("I'm working")

p = multiprocessing.Process(target=worker)
p.start()

Creating a new process using the multiprocessing module is very simple. Just create a Process object and pass the function to be executed as a parameter. Since each process has its own independent address space and system resources, concurrent execution can be achieved.

2. Using the concurrent.futures module#

from concurrent.futures import ProcessPoolExecutor

def worker():
    print("I'm working")

with ProcessPoolExecutor() as executor:
    executor.submit(worker)

Using the concurrent.futures module, you can create new processes more conveniently and use the ProcessPoolExecutor class to manage the process pool. Since each process has its own independent address space and system resources, concurrent execution can be achieved.

Code Example#

Here is an example of calculating pi in parallel using the multiprocessing module:

import multiprocessing
import time

def calc_pi(digits):
    start = time.time()
    pi = 0
    for k in range(digits):
        pi += ((-1) ** k) / (2 * k + 1)
    pi *= 4
    end = time.time()
    print(f"Digits: {digits}, Time: {end - start:.4f}s")
    return pi

if __name__ == '__main__':
    with multiprocessing.Pool() as pool:
        digits_list = [100000, 200000, 300000, 400000, 500000]
        results = pool.map(calc_pi, digits_list)
        for digits, pi in zip(digits_list, results):
            print(f"Digits: {digits}, Pi: {pi:.10f}")

In this example, we use the multiprocessing.Pool class to create a process pool and use the map() method to call the calc_pi() function for each element in digits_list. Since each element is calculated in a different process, parallel calculation of pi can be achieved.

Summary#

This article delved into the differences between Python multithreading and multiprocessing, and explained why multithreading is called "fake multithreading." We also introduced various methods of starting multithreading or multiprocessing and provided code examples. In practical development, the appropriate concurrent programming method should be chosen based on the type of task to improve program execution efficiency.

Update on 20230712#

According to today's news, Meta company has promised to spend three engineer-years to remove the GIL, and now we are waiting for the Python community to accept the proposal of PEP703. source

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.