Update: you can check this out on GitHub here.
Celery is probably the best known task queuing Python package around. It makes asynchronous execution of Python code both possible and reasonably straightforward. It does, however, come with a good deal of complexity, and it's not as simple to use as I would like (i.e. for many use cases it's overkill). So I wrote a distributed Python task queue. In 55 lines of code (caveat: using two awesome libraries).
Background
What does a distributed task queue do? It takes code like the following (taken from the documentation for RQ, another Celery alternative):
1 2 3 4 5 |
|
and allows it to be sent to a worker process (possibly on another machine) for execution. The worker process then sends back the results after the calculation is complete. In the meantime, the sender doesn't have to block waiting for the (possibly expensive) calculation to complete. They can just periodically check if the results are ready.
So what's the absolute simplest way we could do this?
I submit to you, brokest.py
:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 |
|
And to use it? Here's the complete contents of app.py
:
1 2 3 4 5 6 7 8 9 |
|
Rather than calling the function directly, you simply call brokest.queue
with
the function and it arguments as arguments. While the current implementation is
blocking, it would be trivially easy to make it non-blocking. Adding multiple
workers is just a matter of adding code to make use of a config file with their
locations.
Clearly, the stars here are zmq
and cloud
. ZeroMQ
makes creating distributed systems a lot easier, and the documentation is chock
full of ZeroMQ design patterns (ours is probably the simplest one, Request-Reply).
cloud
is the LGPL'd PiCloud library. PiCloud is a
company whose value proposition is that they let you seamlessly run
computationally intensive Python code using Amazon EC2 for computing resources.
Part of making that a reality, though, required a way to pickle functions and their dependencies
(functions are not normally directly pickle-able). In our example, the Worker
is able to make use of code using requests
library despite not having imported it.
It's the secret sauce that makes this all possible.
Is This For Real?
The code works, but my intention was not to create a production quality distributed task queue. Rather, it was to show how new libraries are making it easier than ever to create distributed systems. Having a way to pickle code objects and their dependencies is a huge win, and I'm angry I hadn't heard of PiCloud earlier.
One of the best things about being a programmer is the ability to tinker not
just with things, but with ideas. I can take existing ideas and tweak them, or
combine existing ideas in new ways. I think brokest
is an interesting example
of how easy it has become to create distributed systems.