Multiprocessing in Python

Hitesh Mishra
4 min readNov 30, 2021

--

Why Python Multiprocessing
  • In Python, single-CPU use is caused by the global interpreter lock (GIL), which allows only one thread to carry the Python interpreter at any given time. The GIL was implemented to handle a memory management issue, but as a result, Python is limited to using a single processor.
  • The multiprocessing the module allows the programmer to fully leverage multiple processors on a given machine. The API used is similar to the classic threading module. It offers both local and remote concurrency.
  • The multiprocessing module avoids the limitations of the Global Interpreter Lock (GIL) by using subprocesses instead of threads. The multiprocessing code does not execute in the same order as the serial code. There is no guarantee that the first process to be created will be the first to complete.

Why Multiprocessing Is Useful?

What is the fastest way to count all numbers from 1 to 1000?

  • Python is only using one core at a time to work. Think about what is faster to do when solving a simple math problem.
  • By summing the results one by one and adding the sum to each other incrementally (1+2=3, 3+3=6,6+4=1000, and so on). One core is working on this task.
  • Split values beforehand into individual chunks and sum the values there first (1 to 300, 301 to 600, and 601 to 1000). Three cores will be working at the same time (the last step would be to sum three values received).

A good starting point would be to learn how the multiprocessing library in Python works.

What Should You Use?

  • If your code has a lot of I/O or Network usage: Multithreading is your best bet because of its low overhead
  • If you have a GUI: Multithreading so your UI thread doesn’t get locked up
  • If your code is CPU bound: You should use multiprocessing (if your machine has multiple cores)

Multiprocessing Examples

  • Multiprocessing is a package that supports spawning processes using an API similar to the threading module.
  • The multiprocessing package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads.
  • Due to this, the multiprocessing the module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows.
  • This basic example of data parallelism using Pool,
from multiprocessing import Pool

def f(x):
return x*x

if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))

will print to standard output:

[1, 4, 9]

The Process class

from multiprocessing import Process

def f(name):
print('hello', name)

if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
  • To show the individual process IDs involved, here is an expanded example:
from multiprocessing import Process
import os

def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())

def f(name):
info('function f')
print('hello', name)

if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()

Process and exceptions:

  • run(): Method represents the process’s activity.
  • start(): Start the process’s activity.
  • join([timeout]): If the optional argument timeout is None (the default), the method blocks until the process whose join() the method is called terminates.
  • name: The process’s name. The name is a string used for identification purposes only.
  • is_alive(): Return whether the process is alive.
  • terminate(): Terminate the process
  • kill(): Same as terminate() but using the SIGKILL signal on Unix.
  • close(): Close the Process object, releasing all resources associated with it.
  • exception multiprocessing.ProcessError: The base class of all multiprocessing exceptions.
  • exception multiprocessing.BufferTooShort: Exception is raised by Connection.recv_bytes_into() when the supplied buffer object is too small for the message read.
  • exception multiprocessing.AuthenticationError: Raised when there is an authentication error
  • exception multiprocessing.TimeoutError: Raised by methods with a timeout when the timeout expires

Pipes and Queues

  • When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks.
  • For passing messages, one can use Pipe() (for a connection between two processes) or a queue (which allows multiple producers and consumers).

Conclusion

  • Without multiprocessing, Python programs have trouble maxing out your system’s specs because of the GIL (Global Interpreter Lock).
  • Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. Though it is fundamentally different from the threading library, the syntax is quite similar. The multiprocessing library gives each process its own Python interpreter and each their own GIL.
  • Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don’t share a memory, they can’t modify the same memory concurrently.

Thanks for reading. If you found the article useful don’t forget to clap and do share it with your friends and colleagues. If you have any questions, feel free to reach out to me. Connect with me on 👉 LinkedIn, Github :)

--

--

Hitesh Mishra
Hitesh Mishra

Written by Hitesh Mishra

FullStack | Python | ReactJS | NodeJS | NextJS | Tech Writer

Responses (1)