Multiprocessing in Python
4 min readNov 30, 2021
- In Python, single-CPU use is caused by the global interpreter lock (GIL), which allows only one thread to carry the Python interpreter at any given time. The GIL was implemented to handle a memory management issue, but as a result, Python is limited to using a single processor.
- The
multiprocessing
the module allows the programmer to fully leverage multiple processors on a given machine. The API used is similar to the classicthreading
module. It offers both local and remote concurrency. - The multiprocessing module avoids the limitations of the Global Interpreter Lock (GIL) by using subprocesses instead of threads. The multiprocessing code does not execute in the same order as the serial code. There is no guarantee that the first process to be created will be the first to complete.
Why Multiprocessing Is Useful?
What is the fastest way to count all numbers from 1 to 1000?
- Python is only using one core at a time to work. Think about what is faster to do when solving a simple math problem.
- By summing the results one by one and adding the sum to each other incrementally (1+2=3, 3+3=6,6+4=1000, and so on). One core is working on this task.
- Split values beforehand into individual chunks and sum the values there first (1 to 300, 301 to 600, and 601 to 1000). Three cores will be working at the same time (the last step would be to sum three values received).
A good starting point would be to learn how the multiprocessing library in Python works.
What Should You Use?
- If your code has a lot of I/O or Network usage: Multithreading is your best bet because of its low overhead
- If you have a GUI: Multithreading so your UI thread doesn’t get locked up
- If your code is CPU bound: You should use multiprocessing (if your machine has multiple cores)
Multiprocessing Examples
Multiprocessing
is a package that supports spawning processes using an API similar to thethreading
module.- The
multiprocessing
package offers both local and remote concurrency, effectively side-stepping the Global Interpreter Lock by using subprocesses instead of threads. - Due to this, the
multiprocessing
the module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows. - This basic example of data parallelism using
Pool
,
from multiprocessing import Pool
def f(x):
return x*x
if __name__ == '__main__':
with Pool(5) as p:
print(p.map(f, [1, 2, 3]))
will print to standard output:
[1, 4, 9]
The Process
class
- In
multiprocessing
, processes are spawned by creating aProcess
object and then calling itsstart()
method.Process
follows the API ofthreading.Thread
. A trivial example of a multiprocess program is
from multiprocessing import Process
def f(name):
print('hello', name)
if __name__ == '__main__':
p = Process(target=f, args=('bob',))
p.start()
p.join()
- To show the individual process IDs involved, here is an expanded example:
from multiprocessing import Process
import os
def info(title):
print(title)
print('module name:', __name__)
print('parent process:', os.getppid())
print('process id:', os.getpid())
def f(name):
info('function f')
print('hello', name)
if __name__ == '__main__':
info('main line')
p = Process(target=f, args=('bob',))
p.start()
p.join()
Process
and exceptions:
run
(): Method represents the process’s activity.start
(): Start the process’s activity.join
([timeout]): If the optional argument timeout isNone
(the default), the method blocks until the process whosejoin()
the method is called terminates.- name: The process’s name. The name is a string used for identification purposes only.
is_alive
(): Return whether the process is alive.terminate()
: Terminate the processkill
(): Same asterminate()
but using theSIGKILL
signal on Unix.close
(): Close theProcess
object, releasing all resources associated with it.- exception
multiprocessing.ProcessError:
The base class of allmultiprocessing
exceptions. - exception
multiprocessing.BufferTooShort:
Exception is raised byConnection.recv_bytes_into()
when the supplied buffer object is too small for the message read. - exception
multiprocessing.AuthenticationError:
Raised when there is an authentication error - exception
multiprocessing.TimeoutError:
Raised by methods with a timeout when the timeout expires
Pipes and Queues
- When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks.
- For passing messages, one can use
Pipe()
(for a connection between two processes) or a queue (which allows multiple producers and consumers).
Conclusion
- Without multiprocessing, Python programs have trouble maxing out your system’s specs because of the
GIL
(Global Interpreter Lock). - Multiprocessing allows you to create programs that can run concurrently (bypassing the GIL) and use the entirety of your CPU core. Though it is fundamentally different from the threading library, the syntax is quite similar. The multiprocessing library gives each process its own Python interpreter and each their own GIL.
- Because of this, the usual problems associated with threading (such as data corruption and deadlocks) are no longer an issue. Since the processes don’t share a memory, they can’t modify the same memory concurrently.
Thanks for reading. If you found the article useful don’t forget to clap and do share it with your friends and colleagues. If you have any questions, feel free to reach out to me. Connect with me on 👉 LinkedIn, Github :)