Understanding Streams in Node.js

Published Jan 25, 2017Last updated Mar 18, 2017
Understanding Streams in Node.js

This is another installment in my series of articles, where I try to demystify big words that are used in computer programming. The next big word that I want to tackle is Stream.

What is a Stream?

Before we move on to some codes, let's answer some basic questions:

What is a Stream? A stream is an "infinite" flow of data - period. It's the opposite of an array, which will have a predefined size. You can add new elements to an array but you can always find out how many items there are. With a stream, you don't know when the data will stop flowing - in a network environment, that is. In C, for example, you can use .fseek() to find the total length of a open file. But in Node.js, this is not the case - at least, I haven't discovered a method to check the file size if opened it as a Stream.

I'm Reading a File as a Stream, So I Must know the Size - Right?

True, a file can be read as a stream or loaded in memory, but the point is that if you open it as a stream, there won't be a way to determine how big the file is.

A socket is purely a stream...you don't know how much data you are going to get; the data will be buffered until it reaches a point where the system will pass it to your code so you can do something with it. The amount of data will depend on the network card, operating system, the speed our data comes in, etc.

This means that you work with just a small subset of all the information. It's important to understand that your data will be split into pieces and it will be up to you to recombine it into something that makes sense for your situation.

For example, let's say that you want to display a full sentence: "I love the articles that David write." Your code will get the sentence in the following way:

  1. I love the
  2. articles that Dav
  3. id write.

Your job as a programmer will be to concatenate all of the data until you detect the "." sign. Only then will you be able to display the entire sentence (Read more about sockets here).

The Pipe Concept

The Unix | pipe is nothing new. It was created in the 70s by Douglas McIlroy while he worked at Bell Labs. Pipe's job is to get the output of one program and pass it as input to another one.

In Node.js, we use Pipes inside the code to pass the result of one function to the next. This is seen in Example 3, where we open a file, compress it, and save the result to a new file.


Combining Pipes and Streams

By combining these two concepts, we can connect chunks of code, manipulate the data in a very specific way, and pass it to the next piece of code.

The Best Use Case of a Stream in Node.js

Imagine that you have two hard drives, one with a 100GB file and another with enough space to hold the output. Let's assume that the file is a log file, where we want to extract some useful data.

Loading 100GB into memory on your laptop would not be feasible, but we can solve the problem with Streams. Instead of loading the whole file into memory, the system will load the log file in chunks. This allows your app to use a constant amount of RAM.

Basically, your laptop is just a proxy that manipulates the data and dumps the result in another place, thus making it possible to do work that otherwise would be impossible.

Code Break Down

Each folder in this repository contains a self-contained peace of code that works. Take the time to read the README.md in each example and don't forget to go over all of my comments. All of this combined should give you a good understanding of what is going on.

The End

If you've enjoyed this article/project, please consider giving it a 🌟. Also check out my GitHub account, where I have other articles and apps that you might find interesting.

Where to follow

You can follow me on social media πŸ™πŸ˜‡, at the following locations:

More about me

I don’t only live on GitHub, I try to do many things not to get bored πŸ™ƒ. To learn more about me, you can visit the following links:

Discover and read more posts from David
get started