Pietro Grandinetti, PhD

Data Science & Optimization

Build a Production-Ready API with Rate-Limiter in 15 minutes

Published Feb 14, 2021Last updated Aug 12, 2021

This post is a tutorial on how to build a production-ready API with a rate-limiter, in 15 minutes and with 100% reproducibility.

Once you're finished reading it, you will have:

Learned how to make a API production-ready and reproducible.
Learned how to add a rate-limiter to the API.
Learned how to use production-ready storages for API caching.
Got a fully reproducible project template (via GitHub) that you can use to kickstart similar application.

The story

I ran into a nice project this past week, Flask-limiter. I found it very useful for adding rate limiter to a medium-sized API that I was building. It was quite easy, so I really recommend the package.

In hindsight though, the documentation of the package lacks a bit in reproducibilty for a production environment. Like I said many times, reproducibility matters.

I want to show here what I did for deploying a Flask API in production with the rate-limiter, so everybody can reproduce it quickly and easily.

The tools you will need are:

Flask and Flask-limiter.
Gunicorn, a good production server.
Docker & Docker-Compose.
... nothing else!

Post overview

First of all, I will start by showing how to spin up a API. This process will take just a couple of minutes if you follow along. The API will be simple but not simplistic. It will be a good example of a real-world API.

Then, and before playing with the rate-limiter, I will illustrate how to make the API reproducible and production-ready. I believe doing it at this exact moment is important, because conceptually the API is just a microservice in the multi-services architecture. As such, it must be robust on its own.

I will then introduce the rate-limiter concepts and show how to build a prototype that integrates with the API. This will take just a couple of minutes!

Finally, I will show how to use a robust, production-ready storage (memchaced) to deploy the system made of two microservices: the API and the cache storage.

The entire tutorial can be reproduced using my GitHub repository for this project.

The API (or website)

Setting up a minimal API is very simple, thanks to the official Flask documentation.

However, many examples use the quick and dirty single-file app.py. That's NOT production-ready in my opinion.

Production is where you stop hacking code and instead think about systems and their reliability. The code must be in good shape so to be easily mantainable. The Flask tutorial does a good job at explaining how to organize the code for a production-ready app (or API).

First of all, create a new virtualenv and activate it.

python -m venv ~/.virtualenvs/prod-api
source ~/.virtualenvs/prod-api/bin/activate

Setup the project root directory.

mkdir prod-api
cd prod-api

For the sake of this article the API's logic is going to be simple, so I will only need Flask, Flask-limiter and a Gunicorn, that is a production-ready web server.

pip install flask
pip install flask-limiter
pip install gunicorn
pip freeze > requirements.txt

For my Python projects I start from the .gitignore file that's available on the official GitHub repo.

git init
wget https://raw.githubusercontent.com/github/gitignore/master/Python.gitignore -O .gitignore

I will want to have one endpoint /test and another /resource/test. This is a simplified version of a real-world case where your API has some default endpoint listening on / and many more endpoints to manage specific resources. All according to the RESTful ideas.

To do that, I will first create the logic for the REST resource. Put the following code in a new file flaskr/resource.py.

# flaskr/resource.py

from flask import (
    Blueprint, request, jsonify
)

bp = Blueprint('resource', __name__, url_prefix='/resource')


@bp.route('/test', methods=('GET', 'POST'))
def test():
    if request.method == 'POST':
        response = {'message': 'This was a POST'}
    else:
        response = {'message': 'This was a GET'}
    return jsonify(response), 200

The code above uses Flask's Blueprint objects to create a very simple endpoint that echoes back to the caller whether they had sent a GET or a POST request.

The blueprint must be used from the main app. So, I have to create another file flaskr/__init__.py. Here is the code for this file.

# flaskr/__init__.py

import os

from flask import Flask

def create_app(test_config=None):
    # create and configure the app
    app = Flask(__name__, instance_relative_config=True)

    if test_config is None:
        # load the instance config, if it exists, when not testing
        app.config.from_pyfile('config.py', silent=True)
    else:
        # load the test config if passed in
        app.config.from_mapping(test_config)

    # ensure the instance folder exists
    try:
        os.makedirs(app.instance_path)
    except OSError:
        pass

    # a simple endpoint that says hello
    @app.route('/test')
    def hello():
        return 'Hello, World!'

  # register the blueprint
    from . import resource
    app.register_blueprint(resource.bp)

    return app

This app with one blueprint is complex enough to let me simulate a real-world, production-ready scenario, but it is also simple enough to let me focus on the real goal---using Flask-limiter in production in a reproducible way.

Before going further, I want to check that everything works. I can start the development server in local and send a few requests.

In one shell session, do:

FLASK_APP=flaskr flask run

and you will see the development server starting. You will also see a red warning that says

WARNING: This is a development server. Do not use it in a production deployment.

which is exactly one of the problems we will solve shortly.

For now, start another shell session and test the three endpoints: GET /test, GET resource/test and POST /resource/test:

➜  prod-api git:(master) ✗ curl localhost:5000/test                
Hello, World!
➜  prod-api git:(master) ✗ curl -XPOST localhost:5000/resource/test
{"message":"This was a POST"}
➜  prod-api git:(master) ✗ curl localhost:5000/resource/test       
{"message":"This was a GET"}

Everything seems to work, and your directory should be like the following (don't worry about the __pycache__ sub-folder):

.
├── flaskr
│   ├── __init__.py
│   ├── __pycache__
│   └── resource.py
├── instance
└── requirements.txt

Make the app production-ready

Even before going inside the intricacies of a rate-limiter, I want to stop for a bit and make sure the Flask app, and just the app, is production ready.

The problems to solve towards this goal are:

Use a robust server (gunicorn) instead of the development server.
Make sure that the app is fully reproducible so that it can be deployed easily in the cloud, and the code can be maintained and extended more easily.

Here is where Docker will make our life easier.

The Dockerfile to achieve these goals is very simple:

FROM python:3.7-slim

RUN apt-get update

COPY requirements.txt /tmp/requirements.txt
RUN pip install -r /tmp/requirements.txt

COPY . .

CMD ["gunicorn", "flaskr:create_app()", "-b", "0.0.0.0:5000", "-w", "3"]

With that you can run the API in any server with two commands in the shell:

docker build -t prod-api .
docker run --rm -p 5000:5000 prod-api

This would work no matter of your cloud provider (AWS, GCP, etc.), no matter the type of server (Linux, Windows, etc.). In all cases, the API would still be running in the server and listening on port 5000. You can then attach load balancer, reverse proxy, etc.,... but this is a story for another article.

The need for rate-limiter

If you expose a service through an API that allows users to access resources (which in practice means access to databases, files, nested API calls and complex logic), then you should definitely consider adding a rate limiter to the API.

Rate limiting an API simply means to set a maximum number of requests, in a given timeframe, for each resource. For example, you may want to allow only 1 request every 10 seconds on the resource /resource/test. One in ten seconds is a bit too strict, but you get the point: a user should not abuse of the API, because that will affect badly other users.

That's why you need an additional piece of software that will block users when they send too many requests in a short time span. This piece of software is called rate-limiter.

Flask-limiter is a very easy to use package to accomplish this goal. Very clear documentation, good, pythonic code. However, it has two small problems in my opinion:

The docs do not explain how to use the limiter in each individual blueprint's endpoint (but luckily, the solution is out there, just in a different place).
The docs do not help much about a production deployment.

I solved the first problem with a old trick: I went on the issues page of the project (in GitHub) and searched the keyword "blueprint". And I found the solution immediately!

Let's implement it then! Create a new file flaskr/core.py and put the following code in it.

# flaskr/core.py

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

limiter = Limiter(key_func=get_remote_address)

Then edit the flaskr/resource.py to link the limiter with the blueprint. Here is the modified file, and I have noted what are the new lines.

# flaskr/resource.py

from flask import (
    Blueprint, request, jsonify
)

from .core import limiter  # <------------ New line


bp = Blueprint('resource', __name__, url_prefix='/resource')

# Set a default limit of 1 request per second,
# which can be changed granurarly in each route.
limiter.limit('1/second')(bp)      # <------------ New line


@bp.route('/test', methods=('GET', 'POST'))
@limiter.limit('1 per 10 second') # <------------ New line
def test():
    if request.method == 'POST':
        response = {'message': 'This was a POST'}
    else:
        response = {'message': 'This was a GET'}
    return jsonify(response), 200

The limiter also needs to be initialized and linked with the main app, which requires some changes in the flaskr/__init__.py file. Here's the updated file with comments on the new lines.

# flaskr/__init__.py

import os

from flask import Flask

from .core import limiter  # <------- New line


def create_app(test_config=None):
    # create and configure the app
    app = Flask(__name__, instance_relative_config=True)
    limiter.init_app(app) # <--------------------------------- New line

    if test_config is None:
        # load the instance config, if it exists, when not testing
        app.config.from_pyfile('config.py', silent=True)
    else:
        # load the test config if passed in
        app.config.from_mapping(test_config)

    # ensure the instance folder exists
    try:
        os.makedirs(app.instance_path)
    except OSError:
        pass

    # a simple endpoint that says hello
    @app.route('/test')
    def hello():
        return 'Hello, World!'

  # register the blueprint
    from . import resource
    app.register_blueprint(resource.bp)

    return app

Note To test multiple scenarios, I have added a rate-limit only in the resource endpoints (/resource/test both GET and POST), but not on the root endpoint (GET /test).

At this point you should have a folder structure like the following one (again, don't worry about the .pyc files:

.
├── Dockerfile
├── flaskr
│   ├── core.py
│   ├── __init__.py
│   ├── __pycache__
│   │   ├── core.cpython-39.pyc
│   │   ├── __init__.cpython-39.pyc
│   │   └── resource.cpython-39.pyc
│   └── resource.py
├── instance
└── requirements.txt

Let's see if this works!

Build and run anew the Docker image:

docker build -t prod-api .
docker run --rm -p 5000:5000 prod-api

Then, in a different shell session call twice (quickly) the same endpoint:

➜  prod-api git:(master) ✗ curl -XPOST localhost:5000/resource/test
{"message":"This was a POST"}
➜  prod-api git:(master) ✗ curl -XPOST localhost:5000/resource/test
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>429 Too Many Requests</title>
<h1>Too Many Requests</h1>
<p>1 per 1 second</p>

It worked! The limiter correctly rejected the second request, because less than 10 seconds passed after the first one was sent.

Let's check with the root endpoint.

➜  prod-api git:(master) ✗ curl localhost:5000/test                
Hello, World!
➜  prod-api git:(master) ✗ curl localhost:5000/test
Hello, World!
➜  prod-api git:(master) ✗ curl localhost:5000/test
Hello, World!
➜  prod-api git:(master) ✗ curl localhost:5000/test
Hello, World!

That works too! Because I didn't set any limit on this endpoint, the API allows me to send as many requests as I like.

At this point, we have a Docker image running a good production server (gunicorn) with a Flask API that has blueprints with rate limit enabled.

Are we done yet?

Not quite yet.

What's the problem? The problem is that, by default, Flask-limiter uses a "in-memory" storage.

Wait, what? Yes, the limiter needs to use a physical storage.

Why? Because it needs to keep track of the requests received in the past (and their timestamps, at least), to understand if a new incoming request can be allowed or not.

"In-memory" means the limiter simply keeps a pointer to a local variable (imagine it like a python list), that tracks the past requests.

This is the default in Flask-limiter because it's easy enough for development. But it's so much unlike production!

In production you would normally have multiple containers serving the same API, and they all have to be connected to the same limiter storage. Otherwise, a request might be received by a container that cannot check the "in-memory" storage of another container...and that would make the whole system useless.

Flask-limiter is well designed and supports production-ready types of storage. In particular, Memcached and Redis.

So let's see how to use Memcached for a production set up.

Let's think this through for a minute. In production we will have many containers (we might not even know the actual number, at any given moment), and they all have to use the same memcached service--otherwise the counter for each request would not work correctly.

This means that we have to set up a small microservice architecture. One service will be the API itself, another service will be the memcached storage. The limiter in the first service (API) will connect to the memcached storage in the second service.

If this sounds complicated, don't worry! Docker is here to help us again.

In fact, Memcached as a service is readily available in a docker image. In practice you just need to run two lines!

docker pull memcached
docker run --rm -it --network host memcached

and you will have Memcached running in your computer and listening on port 11211. That's very easy and...just amazing!

Now we can go back to the Python project and tell the limiter that it needs to use memcached, instead of the default "in-memory".

First of all, we need the python bindings for memcached:

pip install pymemcache
pip freeze > requirements.txt

Then we need to change the limiter options so that it connects to the memcached service. Right? Mmmm...be prepared for the twist!

Here's how you should change the flaskr/core.py file, according to the Flask-limiter documentation.

# flaskr/core.py

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

port = '11211'  # <---- New line
host = ...      # <---- New line, but what to use as host address?
memcached_uri = f'memcached://{host}:{port}'  # <--- New line
limiter = Limiter(storage_uri=memcached_uri,  # <---- Line changed
          key_func=get_remote_address)

You may be thinking that the host should be localhost, or 0.0.0.0, since the memcached service is running in the same machine (your computer).

That makes sense, but things are not so easy. Depending on what operating system you are using, the --network host option of Docker may not work consistently. In particular:

If you are using Mac then you will need to use host.docker.internal.
If you are in Linux you can use localhost.

I was using Linux when writing this code, so host = 'localhost'. Make sure to change it if you are running on Mac.

After that, you can start again the API with two lines:

docker build -t prod-api .
docker run --rm --network host prod-api

And test again:

➜  prod-api git:(master) ✗ curl localhost:5000/resource/test
{"message":"This was a GET"}
➜  prod-api git:(master) ✗ curl localhost:5000/resource/test
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<title>429 Too Many Requests</title>
<h1>Too Many Requests</h1>
<p>1 per 10 second</p>

Then wait 10 (or more) seconds.

➜  prod-api git:(master) ✗ curl localhost:5000/resource/test
{"message":"This was a GET"}

It works!

Are we done now?

Well, I would say not. Not quite yet!

In theory, you can keep things like they are now and use the following steps for a production deployment.

Log in in your server and start the memcached service manually, withdocker pull memcached and docker run -itd --network host memcached
Pull your code from some repository.
Edit the code so that in flaskr/core.py the host variable is correctly set. Like I explained, this depends on the OS your server is using.
Then run the API service, manually again, with docker build -t prod-api . and docker run --rm -itd --network host prod-api.

This will work fine. But, in my opinion, this is not a "production" system. There are too many manual steps and too many things that can go wrong.

What if you forget to start the memcached service?

What if you have to spin up a new server and it's a different OS?

What if...too much troubles.

We need one more tool to solve this last standing problem and have a fully reproducible solution, production-ready.

Enter Docker Compose

Docker Compose automates the task of creating a network of docker services that can all talk each to another.

To create a network with the docker-compose program, you need a docker-compose.yml file that specifies the network configuration. Here it is for our API and memcached services.

services:
  api:
    build:
      dockerfile: Dockerfile
      context: .
    ports:
      - "5000:5000"
    restart: "always"
  memcached:
    container_name: memcached
    image: memcached:latest
    ports:
      - "11211:11211"
    restart: "always"

This file is very easy to understand:

There are two services in the network, named api and memcached.
The api service is built with a Dockerfile that is in the . folder (named "context").
The api service is available in the network on port 5000.
The memcached service doesn't use a Dockerfile and instead is pulled from the internet, via the memcached:latest image.
It's available in the network on port 11211.
Both services are supposed to always restart, if either, or both, fails.

The main advantage of this is that in a docker network each service is assigned a unique host name. And this name is the same name of the service.

This means that the memcached service will be available in the network with the host memcached (same name of the service) and on port 11211.

Therefore, the URI string to connect to it will be memcached:11211, no matter what OS your computer (or server) is using.

Now, that's full reproducibility!

Let's go back to the flaskr/core.py file and change it for the last time.

# flaskr/core.py

from flask_limiter import Limiter
from flask_limiter.util import get_remote_address

port = '11211'
host = 'memcached'   # <--------------- Changed line
memcached_uri = f'memcached://{host}:{port}'
limiter = Limiter(storage_uri=memcached_uri,
          key_func=get_remote_address)

And finally, we can spin up the whole backend (API with limiter + cached storage) with 1 line:

docker-compose up

Now, we are done!

If you run into some problems to reproduce and deploy this project let me know!

Api Flask Python Docker

Report

Enjoy this post? Give Pietro Grandinetti, PhD a like if it's helpful.

Pietro Grandinetti, PhD

Data Science & Optimization

I am a systems engineer with a professional background as software developer and a PhD in large-Scale Optimization. I have hands-on experience in Data Science, Web Development and Machine Learning Projects. My strongest stack inc...

Discover and read more posts from Pietro Grandinetti, PhD

get started

5Replies

Dominick Blue

3 years ago

Hi Pietro, this is a great article. Thank you!

I am having trouble with running the test after adding Memcached on my Mac. I receive a

curl: (7) Failed to connect to localhost port 5000 after 7 ms: Connection refused

I can confirm I’m using host.docker.internal as the host as well.

Pietro Grandinetti, PhD

3 years ago

Hi Dominick - thanks for the feedback.

The error you receive mentions port 5000, that’s the Flask service port (not Memcached). Hence, the problem must be in the Flask app.

Can you copy here the full shell session (including the commands you ran), so I can take a look at the logs.

Abdur-Rahmaan Janhangeer

4 years ago

Great article, in-depth and well covered, the kind of what we need on codementor!

Mamba

5 years ago

Thanks Pietro for this tutorial. It was super easy to follow and to implement yet not too simple that it feels like ‘another hello world’ tutorial. Keep up the good work!!

btw, the line from flask.ext.limiter import Limiter gave me the error No module flask.ext found but from flask_limiter import Limiter fixed it.

Pietro Grandinetti, PhD

5 years ago

Thanks Mamba, your comment is very important because it lets me know that this is useful and that I can help out!

Thanks also for pointing out the error. The code in GitHub contained the correct code already, now thanks to you I fixed it in the tutorial too. Thanks!!

Show more replies