How I learned Celery

Published Apr 02, 2018Last updated Sep 29, 2018

About me

An advocate for new technology and improving the human condition with it.

Why I wanted to learn Celery

With single process applications, managing those sometimes rouge IO intensive tasks can be somewhat troublesome. Working within the same application there are options for multi-processing or threads, but I wanted something that was a submit and forget type.

The idea of an async queue really fascinated me. Just the fact of sending a queued job to some remote server and then having another machine consume those tasks as it had time, really sparked the imagination. You can have many producers, many consumers and all processing seperate from the main application with autonomous execution.

What this represented for my application is a seamless and responsive experience for the users, while at the same time performing a myriad of other tasks.

How I approached learning Celery

My project involved several E-Mail based notifications, file operations, maintanence. Sending of these notifications can be a serious bottleneck for the application, so I experimented with async type operations and discovered an eloquent solution. Experimentation with many different approaches to the problem can yeild some great results.

I tried threads first. But found that it would slow the entire processing of the application. Although it did perform slightly better than within the main thread, it wasn't scalable.

Next was processing within its own process. This seemed to be too much overhead to create a process and then handle the process. Although performance was much better than the multi-threaded option, it still had too much overhead to deal with.

First step in any distributed task queue is the broker that is going to be needed to communicate to both the producer and consumer. The broker is somewhat like a database but with a more simplified function. In the case of tasks, it will store identifying information about creation of the task as well as the inputs and results of the task.

Mutiliple types of brokers are supported.

RabitMQ
Redis
Amazon SQS
Zookeeper

I my case, I chose Redis. Installation is simple and straight forward. dnf install redis, boom, done.

Installing celery is just as easy: pip install celery or if you want to get specific pip install celery[redis]

From there you are ready to play.

Challenges I faced

Changes are everywhere when trying something new. Probably one of the greatest pitfalls is adapting to another's coding style or how they interact with the dataset. A pretty common challenge for me is taking the time and being patient enough to understand the approach that was taken.

Probably the most problematic issue with Celery is the lack of transparency of getting results of a running process. Documentation sometimes isn't your friend and can just leave you more confused. But trial and error usually resolves most questions.

Once you are successful in get status back the first time, it's easy after that:

@celery.task(bind=True)
def database_backup(self):
    """Database backup - runs in the background.
  
    :note: Encryption cycle is intensive and will take several minutes to complete
    :param self:
    :return:
    """
    filename = ''.join(random.choice(string.ascii_letters) for m in range(16))
    tmp_db = '/tmp/db_backup.sql'
    tmp_db2 = '/tmp/' + filename

    if os.path.exists(tmp_db):
        os.remove(tmp_db)
    self.update_state(state=STARTED, meta={'message': 'database backup started'})

The last line is key, update_state is how you keep your application informed of the status of the task that was sent to the queue.

Key takeaways

As always, takeaways from any of these experimentation sessions is something gained that you didn't know before. In learning about async queues, it opened up a whole new world of scalability for the applications I work with.

New rule. If it is something that you need done but don't depend on the immediate results from, async processing can really breath more performance into an application.

Tips and advice

The advice for most new technologies is to create yourself a virtual workspace. One you can play in and make a mess. The next and final step is to play, make a mess, try all the flavors.

Final thoughts and next steps

For anyone with a need for a scalable task queue, I've found a good solution in celery. You can get a start with it at: http://www.celeryproject.org/ where you will find some basic examples and guidance on getting started.

Distributed Systems Engineering Python Celery Multi Processing Task queue

Report

Enjoy this post? Give Richard Lowe a like if it's helpful.

Richard Lowe

Embedded Systems Expert and Professional Educator

I have 15+ years of embedded systems development as well as 10 years in the educational field. I've worked in the management of engineering teams as well as a professor, both have given me perspective on how to approach challenges...

Discover and read more posts from Richard Lowe

get started