Dmitry Mazin
"cyberdemon.org is a cool domain"
home / email me / bluesky / mastodon / RSS feed / Telegram channel
LLMs make embarrassingly parallelizable scripts even more embarrassingly parallelizable
One of my favorite LLM tricks is to quickly make a script take advantage of parallelism.
Before we get too far, let me clarify that I use “parallelism” loosely to describe concurrent execution.
Modern computers, of course, have lots of cores. While much of the serious software we use and write takes advantage of this, usually scripts (especially one-offs) do not warrant the implementation work required to use those extra cores.
That means that many embarrassingly parallelizable scripts don’t get parallelized. As the name suggests[1], that’s embarrassing.
For me, LLMs have made such low-hanging parallelism even more embarrassing, because they make it super easy to add that boilerplate to a script.
I mostly use Python, so my example is Python-specific, but this idea applies broadly to quick scripts and one-off programs.
An example in Python
Sometimes, I need to do something across a bunch of files, hostnames, rows in a CSV, etc.
For example, recently I wanted to download a bunch of pages from a website. Very simple – a call to requests.get
in a for loop.
import requests
BASE_URL = "https://www.example.com/pages/"
def download_posts():
for page in range(1, 33):
filename = f"{page}.html"
url = f"{BASE_URL}{page}"
response = requests.get(url)
with open(filename, "w", encoding="utf-8") as file:
file.write(response.text)
if __name__ == "__main__":
download_posts()
How long does it take?
$ time python download_posts.py
python download_posts.py 0.44s user 0.12s system 2% cpu 25.780 total
That’s slow. Almost 26 seconds of my precious wall time, and I’m only downloading 32 pages because it’s a toy example. Usually I’m looping over thousands of items.
This is an embarrassingly parallelizable script
Downloading each page is completely independent. There is no need for any concurrency control or coordination. To parallelize it we only need boilerplate.
This is I/O bound, so multithreading is appropriate.[2] Download the pages in parallel – we’ve got CPUs and network bandwidth to spare.
I’ll be first to admit, though, that I wouldn’t be able to implement multithreading for this script without consulting the docs. I’m just not going to do that for a one-off.
So, I say to the LLM, “Use Multithreading to parallelize this.”.
The LLM (Claude 3.7 Thinking) uses concurrent.futures
. It applies a boilerplate pattern I’m now used to seeing, so I know it looks right. [3]
I set the max_workers
to 32 to show you where the parallelism can be set, though I think if you leave it out, it will use some multiple of the number of cores, which is a good starting point. This is a parameter you want to tune yourself. For example, in this case, if there were enough pages to matter, I’d tune this until I saturated my network interface.
import concurrent.futures
import requests
BASE_URL = "https://www.example.com/pages/"
def download_single_page(page):
filename = f"{page}.html"
url = f"{BASE_URL}{page}"
response = requests.get(url)
with open(filename, "w", encoding="utf-8") as file:
file.write(response.text)
def download_posts():
with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
# Submit all page downloads to the thread pool
futures = [executor.submit(download_single_page, page) for page in range(1, 33)]
# Wait for all downloads to complete
for future in concurrent.futures.as_completed(futures):
# This will raise any exceptions from the threads
future.result()
if __name__ == "__main__":
download_posts()
25 times faster with no effort.
$ time python download_posts.py
python download_posts.py 0.22s user 0.07s system 27% cpu 1.021 total
If you know what you’re doing, it’s easy to parallelize non-embarrassing scripts too
Often, your scripts store some data in an in-memory data structure (e.g. the set of paths you’ve already visited) or write to a file (e.g. update a CSV).
That is not embarrassingly parallelizable: it takes more than boilerplate. You will need to make your script thread-safe.
Let’s say that I wanted to log to a CSV for every page I visited. Maybe I want to keep track of the response codes or something.
import requests
import concurrent.futures
import csv
BASE_URL = "https://www.example.com/pages/"
def download_single_page(page, csv_writer):
filename = f"{page}.html"
url = f"{BASE_URL}{page}"
response = requests.get(url)
with open(filename, "w", encoding="utf-8") as file:
file.write(response.text)
# Write to csv
csv_writer.writerow([page, url, response.status_code])
def download_posts(csv_writer):
with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
# Submit all page downloads to the thread pool
futures = [executor.submit(download_single_page, page, csv_writer) for page in range(1, 33)]
# Wait for all downloads to complete
for future in concurrent.futures.as_completed(futures):
future.result()
if __name__ == "__main__":
csv_file = open('download_results.csv', 'w', newline='')
csv_writer = csv.writer(csv_file)
download_posts(csv_writer)
csv_file.close()
The above is not thread-safe. The download_single_page
threads might try to write to that file concurrently, and csv.writer
itself is not thread-safe.
One easy way to manage concurrent write access to a resource is to make each thread take a lock before mutating it.
You can tell the LLM to implement it for you, of course, but importantly you need to know that you need to do it in the first place, and you need to understand concurrency well enough to make sure it’s implemented right.
So I say to the LLM, “Use a Lock when writing the CSV file”. The LLM uses threading.Lock
, which is a context manager we wrap around blocks to make sure they execute one at a time.
import requests
import concurrent.futures
import csv
import threading
BASE_URL = "https://www.example.com/pages/"
def download_single_page(page, csv_writer, csv_lock):
filename = f"{page}.html"
url = f"{BASE_URL}{page}"
response = requests.get(url)
with open(filename, "w", encoding="utf-8") as file:
file.write(response.text)
# Use a mutually exclusive lock
with csv_lock:
csv_writer.writerow([page, url, response.status_code])
def download_posts(csv_writer, csv_lock):
with concurrent.futures.ThreadPoolExecutor(max_workers=32) as executor:
# Submit all page downloads to the thread pool
futures = [executor.submit(download_single_page, page, csv_writer, csv_lock) for page in range(1, 33)]
# Wait for all downloads to complete
for future in concurrent.futures.as_completed(futures):
future.result()
if __name__ == "__main__":
csv_file = open('download_results.csv', 'w', newline='')
csv_writer = csv.writer(csv_file)
# Use a mutually exclusive lock
csv_lock = threading.Lock()
download_posts(csv_writer, csv_lock)
csv_file.close()
This is far from a perfect implementation. For example, using a single writer thread with a queue would prevent contention for the lock and the file. But, this is safe and fast enough.
Conclusion
Say what you will about LLMs, but they are killer at boilerplate. That often saves you the time you’d spend on monotonous implementation. They can also literally speed up your scripts.
Just remember that you should actually try to understand concurrency [4], because applying this without understanding what’s going on can cause the loss of toes.
Notes
1. I know that’s not what “embarrassingly parallelizable” really means. (back)
2. Can’t blame the GIL for failing to parallelize in this case - that IO code runs outside Python. (back)
3. In real life I’d add more error handling and logging, even to a one-off script (yes, LLMs make that easy too). (back)
4. What made concurrency click for me was Martin Kleppmann’s Designing Data-Intensive Applications (which, like, this is probably the 100th time you’ve seen this book recommended) (edition 2 is coming sometime soon!) and the notes from his Distributed Systems class. (back)