Huey as a minimal task queue for Django
Are you considering adding a task queue to your Django project? Then this article should be useful to you.
Adding a task queue will allow you to:
- Run longer-running processes outside the request-response cycle handled by Django. Actually
gunicorn
oruwsgi
in production. - Run scheduled tasks by not relying on
crontab
. This usually entails management commands or other forms of standalone scripts. Each needing all environment variables loaded correctly.
When I was âyoungerâ task queue with Django project meant celery task queue.
Now that Iâm âolderâ there are simpler alternatives. The simplest I found was Huey. But the ideas presented here apply to evaluating all task queues for your Django project.
Background
Frustrated with celery and django-celery
In December 2019 I was taking a Django project from Python 2 to 3. This project relied on celery and its integration for Django for asynchronous task processing. Github project link here. This work was mostly done back in 2012-2015.
celery
âs âDjango integrationâ part was the first problematic part. âProblematicâ in the sense that it was not being actively maintained. And Python 3 support was only âplannedâ at the time.
Besides, by testing celery with my Python3 setup I realised how âheavyâ it is. âHeavyâ in terms of dependencies. billiard
, kombu
, etc. All with their own Python 3 compatibility caveats.
I had used celery
for a very long time. A decade! But this nudged me into looking for alternatives.
My use cases for using an async task queue were:
- Handle tasks async. For example, sending an email out. Tasks that you should not handle within the request-response cycle.
- Scheduled tasks.
- Retrying of failed tasks. For example, reading from an API fails due to a network error. I want that task retried for a few times.
- Simple locking. More on what I mean by âsimpleâ further down.
The above were (are) handled nicely by celery
. The scheduled tasks part relied entirely on django-celery
.
Another factor that pushed me âoff the celery trainâ was something in my last long-term gig. The increasing reputation that celery is âheavyweightâ.
Finally, celery provides a whole lot more than the above basic set of use cases I need.
Is celery heavyweight?
A colleague achieved significant gains in task execution time by moving off celery. To dramatiq. This was on a Python 3 project I didnât work directly on. By significant I mean ~50% throughput. In this case, âtasksâ were about handling a simple message that wrote one row to the database, at most. With no real processing of that message.
At about the same time the above happened, I was listening to the DjangoChat podcast.
The below is a transcript from the âCachingâ episode from November 2019. Transcript here. The podcast folks were discussing caches. And brought up the usual suspects; Memcached and Redis. On mentioning Redis, Carlton Gibson calls out how easy it is to add a queue when you have Redis in place. And how much of an âoverkillâ celery is. Emphaisis in the below quote is mine:
have Redis? Yeah. You want to you want to use a queue. So letâs take a good queue package. So, you know, everyone always talks about celery, but celery is overkill for, you know, the majority of use cases. So whatâs a good package? Well, thereâs one called django-q, which I love and have fun with. Thatâs nice and simple. And thatâs got a Redis back end. So you pip install, right or, you know, apt install Redis. And then you pip install django-q into your project, you know, a little bit of settings, magic, and youâre up and running [..]
Packages Considered
I did not compare packages. To compare would mean installing each one, run the same task and measure. I had what was left of a âfreeâ day to switch over from celery to another package and have things deployed by dayâs end.
I considered these packages to see if I could adopt a more lightweight alternative to celery. Before I continue, by âlightweightâ I mean, âlightweight in terms ofâ:
- package size and dependencies
- code that I would need to rework to transition from celery to this task queue
The packages I compared considered:
I decided to move on with Huey
. It was not a clear-cut decision.
My mindset was not about installing the best. It was about installing a minimal package. That removes my dependency on celery/django-celery. And allows me to continue taking that projectâs codebase to Python 3.
Why not dramatiq
? I did not go with dramatiq
because of two reasons:
- It required installation of another package. The âAdvanced Python Schedulerâ, APS, to allow scheduling of tasks.
django-dramatiq
, while maintained by the author ofdramatiq
itself, is âyet another packageâ.
After my experience with django-celery
any extra package scared me. I did not want to end up being unable to use a main library due to a smaller accompanying library not being maintained.
In comparison, for Huey
I would only need to pip install huey
. The Django integration part is part of the package. Docs here.
Why not django-q
? It is actively maintained. And targeted for Django. And offers a lot of features. But by the looks of it, it offered many features I was not going to be using. At the cost of being less lightweight than Huey
.
The above does not mean dramatiq
or django-q
are not great packages. Far from it. I have them in mind in case the use case changes.
Huey
Hueyâs Django integration provides:
- Configuration of huey via the Django settings module.
- Running the consumer as a Django management command.
- Auto-discovery of
tasks.py
modules to simplify task importing.- Properly manage database connections.
Sweet.
Code changes
Installed huey
? The Setting Things Up section in Hueyâs Django integration guide covers what you need to do from then on.
Instead of importing the task
or periodic_task
decorators from the main Huey package, import from huey.contrib.djhuey
:
from huey.contrib.djhuey import periodic_task, task
Huey also offers function decorators for tasks that execute queries, which automatically close the database connection:
from huey.contrib.djhuey import db_periodic_task, db_task
Periodic tasks
This is an example of a periodic task. It calls a Django management command to clear expired user sessions every 2 hours:
1
2
3
4
5
6
7
from huey import crontab
from huey.contrib import djhuey as huey
@huey.periodic_task(crontab(hour='*/2'))
def clear_expired_sessions():
from django.core.management import call_command
return call_command('clearsessions')
One minimal aspect of Django+Huey is when it comes to deployment, manage.py run_huey
takes care of executing both:
- tasks that your code adds to the queue while itâs running (e.g. send an email triggered by a user action)
- scheduled (periodic) tasks
Both the task consumer
and scheduler
are run by the same run_huey
process.
Retries
Example: I want to fetch API data on the first of the month at 2:30PM:
1
2
3
4
5
6
7
8
9
from huey import crontab
from huey.contrib import djhuey as huey
@huey.db_periodic_task(
crontab(day='1', hour='14', minute='30'),
retries=2, retry_delay=10)
@huey.lock_task('sync_gsuite_data')
def fetch_api_data():
# function body
The above retries the function twice, with an interval of 10 seconds.
The one aspect thatâs missing in this setup is a hook to retry only in case of specific exception. I want an exception caused by a network error to be retried. But not an exception due to a ZeroDivisionError
, for example.
huey
provides signals that allow you to inspect an exception on various types of events. For example, you could use the below code to notify admins when a Huey task fails without being retried:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import traceback
from django.core.mail import mail_admins
from huey import signals
from huey.contrib import djhuey as huey
@huey.signal(signals.SIGNAL_ERROR)
def task_error(signal, task, exc):
if task.retries > 0:
return # do not notify when task is to be retried
subject = f'Task [{task.name}] failed'
message = f"""Task ID: {task.id}
Args: {task.args}
Kwargs: {task.kwargs}
Exception: {exc}
{traceback.format_exc()}"""
mail_admins(subject, message)
Sidenote 1: Keep in mind signals are executed synchronously by the consumer as it processes tasks.
Sidenote 2: this is just to demonstrate what can be done. A more standard way to do this is to attach an email handler at loglevel ERROR
to the huey consumer logger.
But I would prefer the hook that dramatiq
provides to determine whether a task should be retried in this style:
1
2
3
4
5
6
7
def should_retry(retries_so_far, exception):
return retries_so_far < 3 and isinstance(exception, HttpTimeout)
@dramatiq.actor(retry_when=should_retry)
def count_words(url):
...
See? Iâm veering off âminimalâ and into better features. Letâs move on.
Simple locking
To quote huey
âs author himself:
A simple lock ensures that one task cannot be executed in parallel.
Example use case: a report generation task that runs every 10 minutes, but occasionally it can take 15 minutes to complete. You want to ensure that it does not start a stampede. So you use a lock to ensure that only one instance of the task can run at a time.
Example code from huey
âs docs on locking tasks:
1
2
3
4
@huey.periodic_task(crontab(minute='*/10'))
@huey.lock_task('reports-lock') # Goes *after* the task decorator.
def generate_report():
run_report()
Deployment
In my Ubuntu setup I use supervisor for process management. For example supervisor manages the gunicorn
which binds the Django application to Nginx.
The bash scipt
To run Huey, supervisor runs a bash script that runs Huey:
start_huey.bash
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
#!/bin/bash
NAME="mydjangoproject-huey" # Name of the application
DJANGODIR=/home/ubuntu/webapp/mydjangoproject/proj # Django project directory
DJANGOENVDIR=/home/ubuntu/webapp/mydjangoproject_env # Django project virtualenv
echo "Starting $NAME as `whoami`"
# Activate the virtual environment
cd $DJANGODIR
source /home/ubuntu/webapp/mydjangoproject_env/bin/activate
source /home/ubuntu/webapp/mydjangoproject/proj/.env
export PYTHONPATH=$DJANGODIR:$PYTHONPATH
# Start Huey
exec ${DJANGOENVDIR}/bin/python manage.py run_huey --flush-locks
If youâre wondering what that --flush-locks
is about ./manage.py run_huey -h
states:
--flush-locks, -f flush all locks when starting consumer.
Use it only if youâre applying âlocksâ as described in the previous section.
Test the bash script above by running it.
It has to be executable, i.e. chmod +x start_huey.bash
if itâs not.
Output should be similar to this:
Starting mydjangoproject-huey as ubuntu
[2020-07-01 16:26:54,455] INFO:huey.consumer:MainThread:Huey consumer started with 1 thread, PID 12113 at 2020-07-01 14:26:54.455715
[2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:Scheduler runs every 1 second(s).
[2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:Periodic tasks are enabled.
[2020-07-01 16:26:54,456] INFO:huey.consumer:MainThread:The following commands are available:
+ myapp.tasks.send_email
[...]
supervisor conf file
File located at: /etc/supervisor/conf.d/huey.conf
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
; ================================
; huey supervisor
; ================================
[program:huey]
command = /home/ubuntu/webapp/start_huey.bash ; Command to start huey
user=ubuntu
numprocs=1
stdout_logfile=/home/ubuntu/webapp/logs/huey/worker.log
stderr_logfile=/home/ubuntu/webapp/logs/huey/error.log
stdout_logfile_maxbytes=50MB
stderr_logfile_maxbytes=50MB
stdout_logfile_backups=10
stderr_logfile_backups=10
autostart=true
autorestart=true
startsecs=10
; Need to wait for currently executing tasks to finish at shutdown.
; Increase this if you have very long running tasks.
stopwaitsecs = 2
; Causes supervisor to send the termination signal (SIGTERM) to the whole process group.
stopasgroup=true
A lot of the conf
file above is supervisor-specific.
The point of this section is to show you how simple it is to have huey
with Django run reliably on an Ubuntu instance.
Conclusion
What do you think about this? Can it be better? Can it be more minimal?