Abrupt IO speed decrease in a multithreaded Python script on GCP

I’m running a Python script that loads images, preprocesses the images in some way, and saves the image back to a cache directory. There’s about 500k images. It was taking too long to run for our use case, so I added multithreading. We’re running on a GCP Compute Engine VM instance with the raw images located on a HDD, and the preprocessed images being saved to SSD.

When using 8 vCPU’s and 12 threads, it does around 30-40 images per second for the first 30 seconds or so. Awesome speed and totally works for our use case. Then after 30 seconds, it suddenly drops down to around 1 image per second, which is not acceptable. The slowdown happens essentially instantaneously and remains at a consistent 1 image per second over the course of hours.

It does a similar thing when I limited the max number of threads to 4 and even 1 thread. In the single worker case, it ran for about 3 minutes at 5 images per second before it slowed down to 1 image per second. So it seems to be some type of rate limiting / resource exhaustion / quota limit.

What could be causing this problem? And how can I resolve the problem?

Here’s a code sample showing usage (note that it uses tqdm for a progress bar):

import concurrent
import io
import pandas as pd
import tqdm

df = pd.DataFrame(...)

with concurrent.futures.ThreadPoolExecutor(max_workers=12) as pool:

    df['results'] = list(
        tqdm(total=df.shape[0], iterable=pool.map(preprocess_row,

def preprocess_row(raw_path, cache_path):
    img = io.imread(raw_path)
    img = ...                  # Arbitrary preprocessing

    io.imsave(cache_path, img)

    result = ...  # Boolean that tracks whether an error occurred

    return result

Source: Python Questions