rlim
Main image for Speed up Python without GIL

Speed up Python without GIL

written by Ricky Lim on 2025-08-23

GIL, Global Interprete Lock, has been a safety mechanism for CPython memory management, ensuring thread safety when dealing with Python objects.

However, It also comes with a performance cost, especially for embarrassingly CPU-bound tasks that couldn't truly run in parallel with threads. Even if we spawn multiple threads, only one can execute at a time, leading to underutilization of multi-core processors.

Traditionally, the workaround was to use multiprocessing.Pool for parallel execution. While effective, it brings its own overhead:

The exciting news? Since Python 3.13, CPython has introduced experimental GIL-free builds, such as python 3.14t. This means we can run threads in parallel and no need to rely on heavy multiprocessing for CPU-bound tasks.

Python 3.14t: Free-Threading

Python 3.14t, removes the GIL, allowing threads to run truly in parallel on multi-core processors. To explore this exciting new feature, I wrote a simple python script to check whether a number is a prime using multiple threads. The idea was inspired by fluent python by Luciano Ramalho - a great resource if you want to dive deeper into threading concepts.

Below is a quick snippet of the threaded prime-checking logic. You can find the full script here.

... # Omitting TEST_CASES

def is_prime(n: int) -> bool:
    if n < 2:
        return False
    if n == 2:
        return True
    if n % 2 == 0:
        return False

    root = math.isqrt(n)
    for i in range(3, root + 1, 2):
        if n % i == 0:
            return False
    return True


class IsPrimeWorker:
    def __init__(self, n):
        self.n = n
        self.name = hash(n)
        self.result = None

    def run(self):
        self.result = is_prime(self.n)

    @classmethod
    def create_workers(cls):
        return [cls(n) for n in NUMBERS]

def main():
    workers = IsPrimeWorker.create_workers()
    threads = [Thread(target=worker.run) for worker in workers]
    for t in threads:
        t.start()
    for t in threads:
        t.join()

To run the script with both Python 3.14 (with GIL) and Python 3.14t (GIL-free), run the following commands:

# Run with GIL enabled
uv run -p 3.14 is_prime.py

# Run with GIL removed
uv run -p 3.14t is_prime.py

To benchmark the performance difference between the two versions, use:

uv run --python 3.14 benchmark.py

The benchmark script is availble here, if you'd like to try it yourself.

Here are the benchmark results from my Macbook:

Testing Python 3.14 (GIL)...
  Run 1: 12.51s, 100%,  16.7MB
  Run 2: 12.43s, 100%,  15.2MB
  Run 3: 12.77s, 100%,  15.0MB
Testing Python 3.14t (No GIL)...
  Run 1: 4.84s, 569%,  22.2MB
  Run 2: 4.69s, 575%,  22.1MB
  Run 3: 4.83s, 565%,  27.3MB
Metric Python 3.14 (GIL) Python 3.14t (No GIL) Improvement
Wall Time 12.57 s 4.79 s 2.63× faster
CPU Usage 100 % 570 % 5.7× cores
Memory Usage 15.6 MB 23.9 MB 1.53× overhead

This comparison, fresh from my own local machine 🤗, reveals the following:

Things to watch out for threading:

Key Takeaways