Building A Real-Time Twitter Mood Visualization Using Emojis

Published Jan 23, 2018Last updated Feb 17, 2018
Building A Real-Time Twitter Mood Visualization Using Emojis

Introduction

It was a Thursday and the Academy Awards were happening on Sunday. An idea came up: could we figure out people’s emotions on Twitter during the Oscars, with emojis, so that Coca-Cola could create custom tweets? I had four days to get it done, and I did. This is how it was accomplished, without losing a night of sleep.

I also learned a lot from this experience, which is why I wanted to write about it. The main lessons were:

  • Choosing the right steps is essential to delivering on time.
  • The development process is more important than the code.
  • Never assume things before starting a task.
  • Baby steps. It's a lot easier taking a small step back earlier in the game than after spending
    a whole week on a task.

What was the challenge

The challenge was to build, in a tiny time span, a data-viz of tweet
emotions based on emojis — and it had to be in real-time. The team was me, dealing with the back-end, and a front-end developer doing pair programming.

We had to connect to the Twitter stream, get every tweet in PT-BR about the Oscars, parse it, detect the emotion, and display on some visualization. The project would be in a Coca-Cola War Room Dashboard for everyone there to see.

Right steps to succeed

We started from where we had more fear and doubts: parsing emotions from tweets based on emojis. After this, we can see what the next step is.

We never start a task based on assumptions, and this was very important for
a small deadline. Assumptions could get us off track and cause us to develop a lot
of non-important stuff.

An excellent choice we made was only designing the
server structure after understanding exactly how the Twitter API behaves and how
big it can get. In the end, it was not as big as we imagined and a single
Heroku instance could handle the stream.

I know it's scary to "waste" time testing and checking things when you have
a short deadline, but it's never a waste. So write tests, read the
documentation, and try things before starting the task.

Our stack

We used what we regularly use. We didn't have the time to experiment and,
because of that, we used Python and Django.

Python is great with NLP and has libraries ready to help you. Also, a lot of data scientists use it. Plus, I even found some code snippets online that helped a lot!

Like I said before, the first step is always understanding your
problem better. We started by searching if someone has already done it and, luckily, we found an implementation in Python that used Pandas.

We used this lib to count the total occurrences of each emoji in a text. With that, we could transform the emojis into emotions by implementing an "emoji to
emotion dictionary."

The data-viz

After transforming text into emotions using emojis, the next step was to build a data-viz for it. We knew that a lot of graphics could be powerful for this task, but we had no time to experiment. So we choose the most versatile graph in the world: bar charts!

It was essential to plan the chart beforehand so we could see what kind of data
structure would be better to use. That way we could give Python the
responsibility of saving the data with the correct structure rather than having to do
it in JavaScript. D3.js would build the charts just right.

The web app

We already know how to parse the tweets and how we are going to display it, so we could finally start our Django Project.

I decided to use the database as a middleman between the front-end and the Twitter stream, so we would get the tweets, parse them, save them in the database, and our API could get the tweets from there.

Because we started from the end, we built a data structure so that we could use a query on the database to aggregate the data. The heavy lifting would be done at the database, which is a lot faster than using Python.

The Data structure

Because we choose a bar chart, our data primarily needed to be a summary of
the total number of emoticons at some point in time. It was something like
this:

{
  'ANGRY': 4,
  'LOVE': 10,
  'HAPPY': 2,
  ...
}

To achieve that, we saved a count of the total emotions of every tweet so that we could query them later using Django annotate and get exactly that. It is important to note that because we need to do that in a relatively big database, using db_index helped us a lot!

Getting the tweets

We used Tweepy to connect to Twitter. It's very helpful to have libs to serve you, especially with well-known APIs, because they handle some complex problems and caveats.

In our case, where we needed to use the Twitter Streaming API, the lib was very convenient because we didn't have to worry about keeping a connection alive, know what the best strategy for handling tweets async was, or figure out the best
way to do it in Python and Django.

Everything was already there. We just needed to set up a Django command to start what they called a
Stream and we were done.

Deploy

We used Heroku for this project, primarily because all of the other projects were there and because we didn't have a DevOps team.

Using Heroku is great because it's easier to scale the app or the Postgres if we needed to.

Deploying there is fast and tracking metrics is easier.
Our setup there was a worker running the Django command and another one
running the web app. Easy breezy.

The big day

On Academy Awards day, we were relaxed because we had already tested it with
different hot trending topics and the app was running smoothly.

The results were impressive — the team had real-time feedback about what
the mood was during the event.

One specific case was fascinating: Lady Gaga was
singing, and at first the song was sad, so everybody was also sad. Later, it
turned, and the emotions in the dashboard immediately changed to happy. It
was very nice to see the app working in real-time.

From a POC to sales

After this successful experiment, the company knew it had a valuable
product on its hands and started showing it to other potential clients. I had to transform an app designed to work with a single Heroku instance without login, with specific brand colors and logo, into a SaaS product.

To maintain the current buzz around it, our first step was to transform the product into
a white-label project. For each new client, we created an entirely new Git
branch and Heroku instance with custom ENV variables. Because of this, we
managed to show off custom instances of this product to many clients and
even sold one.

By the way, ENV variables can help a lot in Heroku. You can quickly change
them using the interface, which can change essential things in your app — in
our case, the color scheme, brand name, and logo.

After that, what we did was separate the project into two parts. One is
the admin, where the user logged in and created a new search, and the other is the
front-end and the worker connected to the API.

The challenging part was finding a way to separate the configs and the workers because now everything is "hardcoded" in Heroku ENV.

The admin part transformed into a new project in Django connected to our SSO. It was a simple CRUD where the user can only create one search.

This limitation was so we could launch earlier and do the instance part manually, which meant that every time a client created a new search, we received an email with the settings to set up and deploy a new Heroku app for the front-end and worker.

That being said, we knew that Heroku had an API we could implement later to make everything automatic when necessary.

The worker and front-end were the original project, but instead of getting
the config from an ENV, it got the config from the other project via API. The only setting in the ENV was the SearchID that the client just created.

This
connection was through SSL and HTTP Basic. During development, we
found out that developing an app with another project as a dependency was
especially critical.

Using multiple back-ends

One alternative to solving the dependency between the front-end and admin app was using some solution to boot the entire development environment altogether, but I
didn't like the idea of adding more things to this project.

The solution was to make two back-ends, one that connected to the API and was the
default for production, and another that got the config from ENV. It's a
good solution because Django already uses something like that to send emails, for example.

This kind of decision is necessary to highlight because it made the project
easier to maintain in the long run.

Conclusion

Later, the product had fewer requests and we stopped development at
that stage. Keeping things simple makes me feel more comfortable knowing that even
if the product didn't succeed as a SaaS, we never spent more time and money than
necessary.

I hope my experience can help with your projects.

Discover and read more posts from Thiago Garcia
get started