ScaleGrid.io

Caching Tweets Using Node.js, Redis and Socket.io

Published Aug 28, 2017Last updated Aug 29, 2017

ScaleGrid is the only MongoDB and Redis hosting solution that lets you manage mongo and Redis instances on both public clouds and on premise from a single central console. Try us free for 30 days.

This blog post was originally published on our ScaleGrid blog and focuses on building a streaming list of tweets based on a search query entered by the user. The tweets will be fetched using Twitter’s Streaming API, stored in a Redis list and updated in the front-end using Socket.io. We will primarily be using Redis as a caching layer for fetching tweets.

Introduction

Here is a brief description of the technologies we will be using:

Redis

Redis is an open-source (BSD licensed), in-memory data structure store, used as a database, cache, and message broker. It supports data structures such as strings, hashes, lists, sets, sorted sets with range queries, bitmaps, hyperloglogs, and geospatial indexes with radius queries.

Node.js

Node.js is a platform built on Chrome’s JavaScript runtime for easily building fast and scalable network applications. Node.js uses an event-driven, non-blocking I/O model that makes it lightweight and efficient, and thus perfect for data-intensive real-time applications that run across distributed devices.

Express.js

Express.js is a Node.js framework. You can create the server and server-side code for an application like most of the other web languages but using JavaScript.

Socket.IO

Socket.IO is a JavaScript library for real-time web applications. It enables real-time, bi-directional communication between web clients and servers. It has two parts: a client-side library that runs on the browser, and a server-side library for Node.js. Both the components have nearly identical APIs.

Heroku

Heroku is a cloud platform that lets companies build, deliver, monitor, and scale apps — it is the fastest way to go from idea to URL, bypassing all those infrastructure headaches.

This article assumes that you already have Redis, Node.js, and the Heroku Toolbelt installed on your machine.

Setup

Download the code from the following repository: https://github.com/Scalegrid/code-samples/tree/sg-redis-node-socket-twitter-search/node-socket-redis-twitter-hashtags
Run npm install to install the necessary components
Finally, you can start the node server by doing “node index.js”. You can also run “nodemon” which watches for file changes as well.

You can also access a hosted version of this app here: https://node-socket-redis-stream-tweet.herokuapp.com/

The Process

Here is a brief description of the process that we will be using to build the demo application:

We will start by accepting a search query from the user. The query can be Twitter mentions, hashtags or any random search text.
Once we have the search query, we will send it to Twitter’s Streaming API to fetch tweets. Since it is a stream, we will be listening when tweets are sent by the API.
As soon as a tweet is retrieved, we will store it in a Redis list and broadcast it to the front-end.

What are Redis lists?

Redis lists are implemented via Linked Lists. This means that even if you have millions of elements inside a list, the operation of adding a new element at the head or at the tail of the list is performed in constant time. The speed of adding a new element with the LPUSH command to the head of a list with ten elements is the same as adding an element to the head of list with 10 million elements.

In our application, we will be storing the tweets received via the API in a list called “tweets”. We will use LPUSH to push the newly received tweet to the list, trim it using LTRIM which restricts the amount of disk space used (as writing a stream may take a lot of space), fetch the latest tweet using LRANGE, and broadcast it to the front-end where it will be appended to the streaming list.

What is LPUSH, LTRIM and LRANGE?

These are a set of Redis commands that are used to add data to a list. Here is a brief description:

LPUSH

Insert all the specified values at the head of the list stored at key. If key does not exist, it is created as an empty list before performing the push operations. When key holds a value that is not a list, an error is returned.

Screen Shot 2017-08-27 at 10.16.00 PM.png

LTRIM

Trim an existing list so that it will contain only the range of elements specified. Both start and stop are zero-based indexes, where 0 is the first element of the list (the head), 1 the next one element and so on.

Screen Shot 2017-08-27 at 10.16.11 PM.png

LRANGE

Returns the specified elements of the list stored at key. The offsets start and stop are zero-based indexes, with 0 being the first element of the list (the head of the list), 1 being the next, and so on.

These offsets can also be negative numbers indicating positions from the end of the list. For example, -1 is the last element of the list, -2 the penultimate, and so on.

Screen Shot 2017-08-27 at 10.16.58 PM.png

Building the application

Our demo requires both a front-end and a back-end. Our front-end is a pretty simple text box with a button that will be used to start the stream.

Screen Shot 2017-08-27 at 10.17.22 PM.png

We need a helper function to build a tweet box once we receive the tweet from our back-end:

Screen Shot 2017-08-27 at 10.17.50 PM.png

We also need a listener to stop the stream and prevent adding any more tweets to the streaming list:

Screen Shot 2017-08-27 at 10.20.53 PM.png

Let’s switch over to the back-end side of things and start writing our /search API.

Screen Shot 2017-08-27 at 10.21.25 PM.png

The above code contains the core of our back-end. Once a request has been received at /search, we start the stream using Twitter’s streaming API that returns a stream object.

Screen Shot 2017-08-27 at 10.21.59 PM.png

We can listen to the stream object for a key called “data” that will send us a new tweet when available.

Screen Shot 2017-08-27 at 10.22.19 PM.png

The “data” object contains the tweet JSON which may look something like this (part of the response has been omitted):

Screen Shot 2017-08-27 at 10.22.51 PM.png

We store this response in a Redis list called “tweets” using LPUSH:

Screen Shot 2017-08-27 at 10.25.26 PM.png

Once the tweet has been saved, we trim the list using LTRIM to keep a max number of tweets (so our disk space doesn’t get full):

Screen Shot 2017-08-27 at 10.25.51 PM.png

After trimming the list, we fetch the latest tweet using LRANGE and emit it to the front-end:

Screen Shot 2017-08-27 at 10.26.30 PM.png

Since this is a demo application, we also need to manually destroy the stream after a specific time so it doesn’t keep writing to disk:

Screen Shot 2017-08-27 at 10.26.53 PM.png

And you’re done! Fire up the server using npm start and enjoy the streaming experience.

A demo of the application is available here: https://node-socket-redis-stream-tweet.herokuapp.com/

For deploying this application on Heroku, check out their docs: https://devcenter.heroku.com/categories/deployment

The entire source code is also available on GitHub for you to fork and work on: https://github.com/Scalegrid/code-samples/tree/sg-redis-node-socket-twitter-search/node-socket-redis-twitter-hashtags

As always, if you build something awesome, do tweet us about it @scalegridio

If you need help with Redis hosting and management, reach out to us at support@scalegrid.io for further information.

Node.js Redis Socket.io Big data Twitter api

Report

Enjoy this post? Give ScaleGrid.io a like if it's helpful.

ScaleGrid.io

Discover and read more posts from ScaleGrid.io

get started