We removed our free Sandbox April 25th.
You can read more on our blog.

Django and Celery

As you write your application you will certainly need to execute some asynchronous tasks. It could be anything that requires some form of (lengthy) processing: image resizing, archiving, document analysis...

These tasks could be run from the same machine were your application server is, but best practices advise to do this on a different machine because:

  1. You avoid impacts from the background jobs to your application;
  2. Decoupling parts of the application eases maintenance and scaling.

According to the Celery project homepage, Celery is “an asynchronous task queue/job queue based on distributed message passing. It is focused on real-time operation, but supports scheduling as well”.

Django is the famous Python framework that describes itself as a: “high-level Python Web framework that encourages rapid development and clean, pragmatic design”.

This tutorial will show how to use Celery and Django to build a simple web application that executes tasks on a remote daemon. The integration of Celery in the Django administrative panel using django-celery will be covered too.

This tutorial is based on the Django tutorial.

Application Architecture

To experiment with this tutorial, you can clone the application from https://bitbucket.org/lopter/dj-celery/. Here is what the application directory looks like:

.
├── dotcloud.yml     # The description of our stack
├── minestrone/      # The Django project directory
│   ├── __init__.py
│   ├── manage.py
│   ├── settings.py
│   ├── soup/        # Hold the application code
│   ├── templates/   # Hold the templates
│   └── urls.py
├── mkadmin.py       # Used to create the admin account after `dotcloud push'
├── nginx.conf       # Some Nginx rules to serve Django static files
├── postinstall*     # Run at the end of the dotCloud build to setup Django and Celery
├── requirements.txt # Hold the Python dependencies: `Django' and `django-celery'
└── wsgi.py          # The entry point of Django for Nginx

The relevant Python code is located in the minestrone/ [1] directory where we have:

  • settings.py: To configure the database as well as the Celery broker (RabbitMQ), that stores the list of tasks to execute;
  • soup/views.py: Define a web page to enqueue tasks and a page to display the active ones;
  • soup/tasks.py: Hold the jobs definitions.

Once deployed the application runs like this:

  *--------*                     +----------+
  |        |                     |          |
  | Django >--- Enqueue tasks ---> RabbitMQ >-----.
  |        |                     | (Broker) |     |
  *---v----*                     +----------+     |
      |                                           |
      | Query                    *----------*     |
      |                          |  Celery  <-----+
    +-v----------+  .-- Events --<  Worker  |     |
    |            | /             *----------*     | Consume
    | PostgreSQL <=                               | & Run Tasks
    |            | \             *----------*     |
    +------------+  `-- Events --<  Celery  |     |
                                 |  Worker  <-----'
                                 *----------*

We will see how to connect Celery to the RabbitMQ broker and launch some Celery workers, then how to create tasks. There is also some dotCloud specific files that will be covered last.

Setting up a RabbitMQ Server

In order to use this tutorial, you will need to first get a RabbitMQ service. dotCloud recommends CloudAMQP for getting a RabbitMQ server. Follow the directions in our CloudAMQP tutorial to setup your RabbitMQ server.

Connect Celery to RabbitMQ

This is just a matter of editing settings.py to specify the host, port, user name and password of RabbitMQ. Assuming you followed the steps above, when setting up your RabbitMQ server, these credentials are found in the file ~/environment.json generated by the dotCloud build process when you push your application to dotCloud.

The environment file is loaded into a Python dictionary in the beginning of the settings.py file:

# minestrone/settings.py:

# Django settings for minestrone project.

import os
import json
import djcelery

# Load the dotCloud environment
with open('/home/dotcloud/environment.json') as f:
  dotcloud_env = json.load(f)

# …

With the credentials parsed, django-celery is set up and the Celery broker configured:

# minestrone/settings.py:

# …

# Configure Celery using the RabbitMQ credentials found in the dotCloud
# environment.
djcelery.setup_loader()
BROKER_HOST = dotcloud_env['CLOUDAMQP_RABBITMQ_AMQP_HOST']
BROKER_PORT = int(dotcloud_env['CLOUDAMQP_RABBITMQ_AMQP_PORT'])
BROKER_USER = dotcloud_env['CLOUDAMQP_RABBITMQ_AMQP_LOGIN']
BROKER_PASSWORD = dotcloud_env['CLOUDAMQP_RABBITMQ_AMQP_PASSWORD']
BROKER_VHOST = dotcloud_env['CLOUDAMQP_RABBITMQ_AMQP_VIRTUALHOST']

BROKER_VHOST corresponds to a name space in RabbitMQ where Exchanges and Queues are stored. If you decide to use a different broker (e.g: MongoDB or Redis) this will correspond to the database name or number to use.

We also configure the default Celery queue, this is completely optional and we simply use it in this tutorial to illustrate how you can route tasks with RabbitMQ:

# minestrone/settings.py:

# A very simple queue, just to illustrate the principle of routing.
CELERY_DEFAULT_QUEUE = 'default'
CELERY_QUEUES = {
    'default': {
        'exchange': 'default',
        'exchange_type': 'topic',
        'binding_key': 'tasks.#'
    }
}

# …

Running the Workers

With django-celery the Celery workers are launched using manage.py from the root of the application, with the command:

$ python minestrone/manage.py celeryd -E -l info -c 2

Here is what each command switch does:

  • -E activates events, this tells the workers to send notifications of what they are doing (started/finished a task, etc.);
  • -l info asks the workers to log every messages that have a priority superior or equal to “info”;
  • -c 2 launches two workers (“c” as in “concurrency”).

We will see how to automatically run this on dotCloud when you push your code, but let’s create some tasks to execute first.

Create Some Tasks

Tasks in the Celery sense can be plain functions using a “@celery.task.task” decorator or classes that inherits from the “celery.task.Task” class. In a Django application you place your tasks in a tasks.py module, side by side with your views or models.

This tutorial define a very simple task, called “lazy_job” in the “soup” application:

# minestrone/soup/tasks.py:

import time

from celery.task import task

@task(ignore_result=True)
def lazy_job(name):
    logger = lazy_job.get_logger()
    logger.info('Starting the lazy job: {0}'.format(name))
    time.sleep(5)
    logger.info('Lazy job {0} completed'.format(name))

We give the “ignore_result=True” argument to the task decorator to tell Celery that we don’t care about the result of our task. This is advised by the Celery documentation to reduce resources usage [2].

New “lazy_job” tasks are created from the “EditorView” view using the “apply_async” method:

# minestrone/soup/views.py:

from django.http import HttpResponseRedirect
from django.views.generic import TemplateView, FormView
from django import forms
from celery.task.control import inspect

from minestrone.soup import tasks

# …

class EditorView(FormView):

    # …

    def form_valid(self, form):
        name = form.cleaned_data['job_name']
        routing_key = 'tasks.{0}'.format(form.cleaned_data['routing_key_name'])
        tasks.lazy_job.apply_async(args=[name], routing_key=routing_key)
        return HttpResponseRedirect(self.get_success_url())

The “name” and “routing_key” variables are extracted from the form embedded in the view. The “apply_async” method can take different, all optional, arguments, here we are only using:

  1. “args” that contains the list of arguments to forward to the “lazy_job” function;
  2. “routing_key” that can be used to route tasks to different workers, but here, the key will be always matched by the “default” queue defined in minestrone/settings.py.

The Django Celery Admin Panel

Without really letting you know, we already did half of the Django and Celery integration. The two missing things are:

  • a database to store all the Celery events;
  • running the celerycam daemon that takes snapshots of the events sent by the workers, storing them in a database.

The database is configured from the settings file, using the dotCloud environment:

# minestrone/settings.py:

# …

DATABASES = {
    'default': {
        'ENGINE': 'django.db.backends.postgresql_psycopg2',
        'NAME': 'template1',
        'USER': dotcloud_env['DOTCLOUD_DB_SQL_LOGIN'],
        'PASSWORD': dotcloud_env['DOTCLOUD_DB_SQL_PASSWORD'],
        'HOST': dotcloud_env['DOTCLOUD_DB_SQL_HOST'],
        'PORT': int(dotcloud_env['DOTCLOUD_DB_SQL_PORT']),
    }
}

We are using PostgreSQL, but any database supported by Django would work.

The command to launch the celerycam daemon is:

$ python minestrone/manage.py celerycam

As for the Celery worker, we will see how to automatically execute this command when you push on dotCloud in the next sections.

For your reference, the other parts of the Django Celery integration were:

  • Celery is configured from the Django settings.py file (instead of the usual celeryconfig.py file);
  • celeryd is invoked through Django’s manage.py command instead of being directly run from the shell;
  • You have to pass the -E argument to celeryd to collect workers events.

dotCloud Specific Details

We have already seen how we use use the environment.json file to configure Celery and PostgreSQL automatically. But it’s not the only dotCloud specific detail, there is also:

  • A dotCloud build file: dotcloud.yml;
  • A Supervisor configuration include: supervisord.conf that is installed from the postinstall hook;
  • A wsgi.py file that bridges the web server and Django;
  • A nginx.conf that defines a couple of locations where are stored the Django static and media files.

For the Python dependencies, dotCloud follows the requirements.txt convention, this file contains:

Django
django-celery
setproctitle

These dependencies will be installed by pip (an easy_install replacement) from PyPI when the application is pushed on dotCloud. The setproctitle package is an optional dependency of Celery. When it is installed Celery can display useful informations in the process title instead of the command line. You can see it in action if you log into the workers services and run ps aux.

The setup of the Django static and media files and of wsgi.py is already covered in the Django tutorial, so only the dotcloud.yml and supervisord.conf are detailed here.

The dotCloud Build File

The dotCloud build file is straightforward and just describes the architecture of our application on dotCloud, with its four services:

  • a web server pre-configured for Python WSGI-compatible applications like Django;
  • a service to launch workers;
  • a PostgreSQL server;

dotcloud.yml:

www:
    type: python

workers:
    type: python-worker

db:
    type: postgresql

The Supervisor Configuration Include

The Supervisor configuration include, named supervisord.conf is used to launch and monitor the Celery workers and the celerycam daemons:

[program:djcelery]
directory = $HOME/current/
command = python minestrone/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

[program:celerycam]
directory = $HOME/current/
command = python minestrone/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

You may have recognized a piece of .ini file, let’s break it down:

[program:djcelery]

Defines a new background process configuration block. The process is called “djcelery” here.

directory = $HOME/current/
command = python minestrone/manage.py celeryd -E -l info -c 2

These two lines tells Supervisor how the background process should be launched:

  • In the directory /home/dotcloud/current, this where the dotCloud builder will install your application and thus where lives the Django application;
  • Using the command: “python minestrone/manage.py celeryd -E -l info -c 2”.
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

Redirect the output of the workers into these two log files, “%(program_name)s” will be replaced by “djcelery”. If you don’t put these two lines Supervisor will create the log files for you, but with a less readable (random) name.

The exact same thing is repeated for the celerycam daemon.

This configuration include is generated from the Build Hooks, if we are on the “workers” service:

# postinstall:

# …

dotcloud_get_env() {
    sed -n "/$1/ s/.*: \"\(.*\)\".*/\1/p" < "$HOME/environment.json"
}

setup_django_celery() {
    cat > $HOME/current/supervisord.conf << EOF
[program:djcelery]
directory = $HOME/current/
command = python minestrone/manage.py celeryd -E -l info -c 2
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log

[program:celerycam]
directory = $HOME/current/
command = python minestrone/manage.py celerycam
stderr_logfile = /var/log/supervisor/%(program_name)s_error.log
stdout_logfile = /var/log/supervisor/%(program_name)s.log
EOF
}

if [ `dotcloud_get_env DOTCLOUD_SERVICE_NAME` = workers ] ; then
    setup_django_celery

# …

Conclusion

Django and Celery are very easy to get running on dotCloud. Let’s review what we did here:

  1. Write a dotcloud.yml with all the services we need;
  2. Setup a RabbitMQ server and added the ENV variables to our application.
  3. Configure Celery and PostgreSQL in settings.py;
  4. Defines some tasks in the tasks.py module of the Django application;
  5. Launched some workers by using a supervisord.conf file on the Python worker service;
  6. Enqueue some tasks using its “apply_async” method from a Django view.

The example application lives on: http://django-celery.dotcloudapp.com/.

You can clone the code from https://bitbucket.org/lopter/dj-celery/, “cd” into it, create your application on the flavor of your choice and push it on your own dotCloud account with:

$ dotcloud create djcelery
$ dotcloud push djcelery

While you are fiddling with the web interface you can see the jobs being performed in the workers logs:

$ dotcloud logs djcelery.workers

You can also have a look at the Django administration panel at: http://<yourapp-url>/admin/. The user name to use is “admin” with the default password (configured in mkadmin.py): “password”.

Celery is very powerful, especially when coupled with RabbitMQ, you are highly encouraged to take a look at its excellent documentation: http://celery.readthedocs.org/en/latest/.


[1]The Minestrone soup is a famous Italian dish that contains some Celery…
[2]http://celery.readthedocs.org/en/latest/userguide/tasks.html#tips-and-best-practices