Sheena

Python expert with a focus on web technologies, microservices and devops. I also do some frontend work (React and Angular experience)

GraphQL versus REST

Published Dec 06, 2018Last updated Jun 03, 2019

Sounds pretty straight forward, right? REST APIs are what we are all used to. Whereas GraphQL can beat Chuck Norris in an arm wrestling contest and is composed entirely of unicorns.

Ok...not really, but a lot of people act like that's the case.

For those of you who are unfamiliar with GraphQL, it's a text-based query language specification created by Facebook. It has a few open source implementations and is supported by a solid ecosystem of awesome tools. It got pretty popular pretty quickly. As testament to its awesomeness, even Github is a fan.

But...what is it?

This article aims to give a solid introduction to GraphQL by comparing it to REST. By the end, you should have a good idea of what GraphQL can do for you and whether it's worth exploring further or not.

And if you think it is worth exploring further then stay tuned! This is the first in a series of GraphQL articles.

A Totally Imaginary Application

Let's say you have a flask-based web application that manages a bunch of code repositories, and those code repositories have names and descriptions and a few other attributes. And let's say different users have access to different repositories. You store this data in your preferred database. It might be relational, it might be noSQL, it won't make a huge difference for this article.

Let's say your classes look something like this:

Now let's say you have a few users and a few repos. You can picture your data as some kind of graph. For the rest of this article we'll pretend your data looks like this:

Now for some graph-theory lingo: You see those round things on the graph? Those are called "nodes". We have nodes that represent Users, and we have nodes that represent Repos. The lines between the nodes are called "edges".

Creating REST Endpoints to GET Data

I chose Flask for this tutorial because it's lightweight and fairly intuitive. You don't actually need to know flask to follow along. If you have ever created a web application then a lot of this should look familiar. I'm using flask to demonstrate the concepts, so focus on those concepts rather than on flask.

Your flask application might have some code like this:

@app.route('/repos')   # this is a url pattern or route
def list_all_repos():  # this function generates an http response
    ...                

@app.route('/users')
def list_all_users():
    ...

@app.route('/users/<int:user_id>')
def get_one_user(user_id):
    ...

@app.route('/users/<int:user_id>/repos')
def get_one_user_repo_list(user_id):
    """return minimal info about all the repos the user has access to"""
    ...

@app.route('/repos/<int:repo_id>')
def get_one_repo(repo_id):
    ...

@app.route('/repos/<int:user_id>'/users)
def get_one_repo_user_list(user_id):
    """return minimal info about all the users that have access to this repo"""
    ...

Simple enough. So your view for /repos/ should return a list of repo IDs and minimal information about those repos, such as names and descriptions. And your view for /repos/<repo_id> would need to return all the information about the repo. The name, description, primary language, list of associated user IDs, etc. Same deal for users. How very neat and tidy.

Over and Under Fetching

This is the most famous problem GraphQL solves, so we'll cover it first. Let's say you have a front-end that consumes your REST API. Now on your front-end you have a page that shows:

A list of the names and descriptions of the repos that the current user has access to, and
A list of all the users collaborating on each of the above repos.

Looking at our graph from before, User 1 would want to see the following info:

Repo	Collaborators
Repo A name and description	User 2 name and avatar

While User 4 would want to see the following info:

Repo	Collaborators
Repo C info	User 2 and User 3 info
Repo D info	User 5 info

To get that data we would first need to fetch the current User's list of repos (/users/<int:user_id>/repos). Then for each of those repos we would need to fetch it's basic info (/repos/<int:repo_id>) and a list of all the user IDs associated with it (/repos/<int:repo_id>/users). Then we would need to fetch all the info that we want to display for each of those users (/users/<int:user_id>).

So that's a lot of unnecessary traffic. Each API call we make either fetches too much data or too little data. Yuck.

There is a simple way to counteract this inefficiency, which is to write code specifically for these complicated queries. So you might make another view like this in order to fetch exactly the info that you need to fetch.

@app.route('/users/<int:user_id>/repos_and_collaborators')
def get_user_repo_collaborators(user_id):
    ...

This is not ideal for a lot of reasons. For starters, this approach would lead to a whole lot of extra code being written, especially if it's a significant application. More code means more testing and more bugs. The new views would also be inconsistent with the rest of the application views.

Another problem here is that if the front-end requirements change, then the get_user_repo_collaborators view should probably change as well. For example maybe we want to include the repo's created time or to exclude the user's avatar.

And maybe there are other parts of the front-end that make use of the get_user_repo_collaborators view, so changing it might have unintended consequences. So we'll have to be super careful about where these highly specific API calls get used.

But wouldn't it be great if the front-end could just request exactly the data it needs without any back-end changes?

Introducing the GraphQL Endpoint

Let's consider a GraphQL approach:

@app.route('/graphql')
def graphql():
    ...

The first thing to notice is that there is only one endpoint here. REST relies heavily on the HTTP protocol. Urls, url parameters, and HTTP verbs are all meaningful in REST.

But a GraphQL query is just text (kind of like how a SQL query is just text). We can pass that text over HTTP if we want to, and in this case, we do. But this is not a necessary step.

Without going into detail on how the graph is actually defined (that's a topic for another article), here is a query we could potentially use to fetch the exact data we need for our user interface.

query {
    user(user_id:1){       # A
        repos{             # B
            id             # C
            name           
            description    
            users{         # D
                id         # E
                username
                avatar
            }
        }
    }
}

Let's break it down:
A. This is the node we're starting at, we'll look for a User with the id of 1.
B. This user is associated with a group of repo nodes, we'll iterate over all of the repos for this user.
C. Each repo node has an ID, name, and description.
D. Each repo node is associated with a group of users, we'll iterate over all of these users.
E. From each of these users, we get their username, avatar, and ID.

GraphQL always returns a JSON response. REST, on the other hand, makes use of HTTP content negotiation mechanisms to allow for many different types of content. For example, you could use a REST endpoint to get CSV or HTML content. GraphQL is only JSON.

In this case our response would be something like:

{
    "data" : {
        "user" : {
            "repos" : [
                {
                    "id"          : "Repo A's id",
                    "name"        : "Repo A's name",
                    "description" : "Repo A's description",
                    "users" : [
                            {
                                "id"       : "User 1's id"
                                "username" : "User 1's username"
                                "avatar"   : "User 1's avatar"
                            },
                            {
                                "id"       : "User 2's id"
                                "username" : "User 2's username"
                                "avatar"   : "User 2's avatar"
                            }
                    ]
                }
            ]
        }
    }
}

Now if the front-end requirements were to chance, then the front-end would just ask for different data. For example, if the repo created_time, rather than the user's avatars, was needed, then the query would change to something like:

query {
    user(user_id:1){      
        repos{            
            id             
            name           
            description    
            created_time   ## new attribute requirement
            users{         
                id         
                username
                # avatar no longer needed
            }
        }
    }
}

The back-end would remain the same, no changes would be necessary.

Errors and Permissions

Since REST APIs rely on HTTP, REST applications can make use of HTTP status codes. Basically, when an application receives an HTTP request and responds with an HTTP response, that response will contain a status code along with some data (the response body) and meta-data (the response header).

A status code of 200 means that everything is OK, 404 means Not Found, 500 means Internal Server Error. And my personal favorite is 418, you can look that one up

So having a bunch of well defined and meaningful status codes is really nice. For example, let's say a user tries to access a repo they aren't allowed to access. The application can respond with a 403, which the user interface will understand that and display a useful error message.

But there's one problem here. Let's say a user is allowed to see every other user's username and avatar, but they aren't allowed to see their phone numbers or email addresses. Meaning each user has both public and private information.
true
Looking at our REST endpoints from before, this would be insufficient:

@app.route('/users/<int:user_id>')
def get_one_user(user_id):
    ...

We would need a way for the requester to specify what data it wants. For example, we could implement something like this:

@app.route('/users/<int:user_id>/private_info')
def get_one_user_private_info(user_id):
    """returns ALL user info"""
    ...

@app.route('/users/<int:user_id>/public_info')
def get_one_user_public_info(user_id):
    """returns username and avatar"""
    ...

Now we just need to make use of the right endpoint when requesting user data. Easy.

And, of course, whenever we create a new custom view like get_user_repo_collaborators, we need to be careful that we aren't exposing any private information. In other words, every view would have to have knowledge about what can and cannot be exposed in different circumstances. This isn't very DRY. It could be made DRY, but then that's not very simple.

The other problem with this is that REST views tend to be all or nothing.

Let's say User 1 makes an API call to /users/2/private_info. They'll get a 403 error. They wont get even the public portions of the data. They'll have to make a whole new API call to fetch User 2's username and avatar.

The Error Array

GraphQL has a much nicer way of doing things. Let's take a look at this simple query:

query {
    users{               
            username
            cellphone
        }
    }
}

This query tries to fetch the usernames and cellphone numbers of all the users on the system. This, of course, isn't allowed. Let's say we are currently logged in as User 1. The response would (likely) be something like:

{
    "data" : {
        "users" : [
            {
                "username" : "User 1's username",
                "cellphone" : "User 1's cellphone"
            },
            {
                "username" : "User 2's username",
                "cellphone" : null    
            },
            {
                "username" : "User 3's username",
                "cellphone" : null
            },
            ...
        ]
    },
    "errors" : [
        {
      "message": "permission denied for Field cellphone",
      "locations": [
        {
          "line": x,
          "column": y
        },
        ...
      ]
    },
    ...
    ]
}

What's nice about this is that we can still get the data we are allowed to get, and the error message is a bit more specific about what isn't allowed. It even refers to the exact part of the original query that was at fault.

What isn't nice about this is that the error messages don't have to match any particular standard. Your back-end application will generate the error messages however it wants to. It might be a stack trace, or it might have an application-specific error code, or it might just be some text. It's up to you and your back-end application to decide.

When your front-end application gets an error message, it needs to do something useful with it. Maybe it needs to display the message to the user as is, maybe it needs to tell the user "something went wrong on our side, our engineers are on it!", or maybe something else entirely.

Overall, this is quite a cool feature but I suggest that you take care when designing your error handling mechanisms and conventions.

Discoverablity

Another win for GraphQL is that it is discoverable by default. Let me explain.

In order to expose a GraphQL endpoint on a web application, the first thing you need to do is create a representation of the data structure. This is called a schema. It's sort of synonymous with a database schema. We'll go through the details of how to do this in an upcoming tutorial.

Basically the schema defines what attributes are available for different nodes on your graph and explains how to move from node to node.

Now the really cool thing about the schema is that it is a graph itself. So looking at our GraphQL view:

@app.route('/GraphQL')
def GraphQL():
    ...

If this view is used to query your data structures (Users and Repos in our case), then the exact same endpoint can be used to query the schema of your graph. The schema is actually quite a complicated looking graph, so it'll be quite painful to explore it by manually writing GraphQL queries. Luckily, there are some awesome tools available that make exploration easier.

Specifically GraphiQL (pronounced graphical, clever eh?). Here is an example of GraphiQl in action. If you follow that link and then open < Docs (top right hand corner), you'll be able to explore all the different types of data available in the schema. You'll notice that there are three "Root types": query, mutation, and subscription. So far we've only dealt with queries (synonymous with REST GETs) in this article, but we'll get to the others a bit later.

To conclude, GraphQL is discoverable by default, whereas REST simply is not. If you want to make your REST API discoverable, you will need to use extra tools to make it happen. For example, Hug is a Python web framework that automatically generates documentation. Another one is Swagger, which is a document format and tool-set that is fairly language agnostic.

Caching

The next big difference between REST and GraphQL is caching.

REST caching leverages HTTP caching. If you were to GET /users/1/public_info twice in a row, then you'll receive the same response both times (unless, of course, there were some back-end data changes).

If your code is running on a modern browser, you get HTTP caching for free. What happens is something like this:

I won't go into details on how to tweak and configure your HTTP caching in this article. I just want to stress that HTTP caching is a mature, robust, well documented technology, and it can speed up your REST applications in a big way.

GraphQL, on the other hand, doesn't know about HTTP caching. We have one GraphQL endpoint and it looks like this:

@app.route('/GraphQL')
def GraphQL():
    ...

When you want to fetch public User data, you send a request to /GraphQL; when you want to fetch the GraphQL schema, you send a request to /GraphQL; when you want to get information about Repos, you send a request to /GraphQL.

HTTP caching works on the premise that if you fetch data from the same URL more than once (keeping headers constant), it should be the same data. GraphQL simply doesn't work like that.

But that doesn't mean GraphQL doesn't have a caching solution. It just doesn't rely on the underlying network technologies.

If you have a front-end application that consumes a GraphQL endpoint, then best practice would be to use a GraphQL client, such as Relay or Apollo. A good GraphQL client has, among other things, caching mechanisms built in. I'll be talking about these in more depth in an upcoming article.

So to conclude, GraphQL has caching, but you don't get it for free.

GraphQL Mutations

Up until this point, we've only spoken about REST GETs and GraphQL queries. REST also has conventions around creating, updating, and deleting objects. Basically, REST prescribes something like this:

HTTP verb	URL pattern	Action
POST	/repos	create a new Repo
GET	/repos/int:repo_id	get info about the Repo
PUT	/repos/int:repo_id	Edit the Repo
Delete	/repos/int:repo_id	Delete the Repo

Which is pretty sensible.

GraphQL is not so strict here. Let's refer back to the GraphiQL front-end we saw earlier. Expand the Docs, click on Mutation, and you'll see a big list of what looks like function prototypes. Let's look at the first one:

createComment(
    text: String! 
    postId: ID
    post: CommentpostPost
    userId: ID
    ): Comment

You can think of this like a function. This function has a few parameters and it returns a Comment object. Comment is a type of object defined in this particular GraphQL schema, it has attributes and is linked to other objects so you can decide exactly what information you want back from it.

If you are creating your own QraphQL schema, then you get to choose which functions to expose and what those functions do. This is great as it gives you a lot of flexibility. But because all your mutations are lumped together into one long list, you need to be careful about your naming conventions.

GraphQL Subscriptions

Now that we've covered queries and mutations, the last GraphQL root type is Subscription. REST doesn't have an equivalent concept here, but that shouldn't stop you from getting creative and implementing them your own way.

It often makes sense for a server to push data to a client in modern applications. This is easier to explain with an example.

Let's say you are writing a chat app. Consider the following:

This is called polling. The client just keeps asking if there are any new messages again and again. This is not exactly efficient.

Here is how GraphQL wants you to do things:

Better.

Basically, GraphQL is designed with modern web applications in mind, and modern web applications often need real-time data. So GraphQL has mechanisms for allowing clients to subscribe to updates.

There is nothing stopping you from making a REST-based subscription mechanism, but it would suffer from the normal shortcomings of REST in that it wouldn't be discoverable and it would likely be highly coupled to the front-end. This means if the front-end wanted to subscribe to an extra attribute, then a whole new function might need to be written.

File Uploads

The last major difference between REST and GraphQL is that GraphQL doesn't understand files, it only understands text. REST, of course, is not limited in this way because you can POST or PUT whatever content you want to (within reason, of course).

If you needed to allow file uploads in your GrahQL web app, I'd suggest making a separate API endpoint just for that.

Conclusion

Most of the differences between GraphQL and REST can be derived from the fact that REST relies heavily on HTTP while GraphQL is protocol agnostic (meaning it is just text). So while REST has HTTP verbs, HTTP-caching, and content-negotiation, GraphQL just doesn't. REST allows file uploads and informative URL structures, while GraphQL doesn't.

But a lot has changed since REST was invented. GraphQL is designed to overcome the limitations of REST. Most notably, GraphQL overcomes the problems of over- and under-fetching, and allows partial responses in the case of server-side errors. GraphQL is also highly discoverable by default, which is absolutely brilliant.It also allows for subscriptions. And even though you don't get to leverage HTTP caching if you use GraphQL, it doesn't mean you can't cache.

For the most part, GraphQL beats REST hands down.

But there is a bit of a learning curve if you want to actually implement a GraphQL application. You'll need to create a GraphQL schema, you would (probably) need to interact with a GraphQL client and write GraphQL queries. Compared to REST, GraphQL had quite a steep learning curve. That said, once you know what you are doing, GraphQL applications can be very quick and fun to write.

Restful Api Graphql

Report

Enjoy this post? Give Sheena a like if it's helpful.

Sheena

Python expert with a focus on web technologies, microservices and devops. I also do some frontend work (React and Angular experience)

I'm about solving problems. Usually I do that by writing code. Often I do that by leading the efforts of others. I get a lot of satisfaction from the constant learning and puzzle solving that comes with my profession. I get even m...

Discover and read more posts from Sheena

get started

6Replies

Anton

7 years ago

Thank you for article.
I would recommend you to look on the https://jsonapi.org/

In JSON:API you don’t need to create many endpoints for each new requirement like /users/<int:user_id>/repos_and_collaborators.

You will be able to make a call with filter and include:
/repositories?filter[owner][id]=%USER_ID%&filter[owner][type]=User&include=collaborators

Varuna Bamunusinghe

7 years ago

Thanks for the well explained article. What is the stack you use for GraphQL development?

Sheena

7 years ago

One example stack would be react+ apollo (or relay) on the front. Then Python, Flask, Sqlalchemy and graphene on the back.

Mike Schinkel

7 years ago

Good article, balanced. Glad to see you mention that REST supports the mature caching of HTTP.

Most people promoting GraphQL give the impression that REST is either legacy or stupid, and that is of course absolutely not the case. REST is best for long-lived systems, like the web itself, and not for great for the constant changing requirements of front-end web developers needing to pull data from backend sources for sites that will probably only live 1-3 years anyway. As you explain.

Show more replies