The Fault In Our Responses: Bytes Remaining To Read

Published Nov 10, 2017
The Fault In Our Responses: Bytes Remaining To Read

This is the story on how a single line of code caused a lot of grey hairs and moments of frustration. I never thought I would be writing a post like this, but I guess it’s my time to shine… the light on a hairy bug that halted developer productivity for three full work days.

We’re currently working on a new e-commerce solution for Grano. We aim to provide a better customer experience by modernizing our platform and improving automation and integration to our production services. Our tech stack is reasonably cutting edge, considering that we’re working with an enterprise-level product and environment: Node.js, Angular, Redux, and AWS, accompanied by high code quality and modern development standards.

Day One: The Descent

It all started when we started noticing weird errors in the Chrome DevTools and soon realized that this problem occurred with every error response from our API server.


POST /user/register net::ERR_SPDY_PROTOCOL_ERROR

We had recently switched to HTTPS on our development environment to more closely match the staging and future production environments.

As we can see from the screenshot, AWS Application Load Balancer (ALB) automatically uses HTTP/2 when HTTPS is used. And, as you might know, Google pioneered HTTP/2 with a custom protocol in Chrome called SPDY, which explains the protocol error we were seeing. That led us to believe that the problem had to be in the ALB.

As we were trying to find more information on this issue, we found out that people had had problems on AWS related to the MTU, or, Maximum Transmission Unit size. So, we connected to the API instance to conduct some tests and noticed no problems with the MTU. While connected to the API instance, we decided to test the API server locally.

ubuntu@aws:~$ curl -H “Content-Type: application/json” \ -d ‘{”email”:”asd@example.com”,”password”:”foo"}’ \ http://localhost:8888/user/registercurl: (18) transfer closed with 1 bytes remaining to read{“message”:”Invalid password.”%

That ruled out any issues with the load balancer or HTTP/2. However, we could not replicate the issue locally! So, we thought there must be something wrong in the VPS environment, but it was time to log off and rest our eyes.

Day Two: The Despair

With fresh eyes and a mind free of clutter, we began installing a new VPS, copied the built artifacts,copied the environment configuration, and started the server. But the hopeful mood was soon swept away as we were greeted with the same message: “transfer closed with 1 bytes remaining to read.”

My mind starting to feel blank, I realized that I had not tried running the built version of the back-end locally. You see, we use Babel to enjoy the latest and greatest JavaScript features without worrying about runtime support.

We only install Babel plugins that are needed by modern Node versions and let everything else run natively so that they would be automatically optimized once V8 developers optimize these new language features. For development, we use a handy tool called babel-node. It works exactly like your node binary, except it automatically transpiles your code in-memory.

So we opened the command line in the build directory and ran the transpiled code via regular node and boom, we were able to replicate the issue locally. Progress at last!

So the problem must be caused by Babel, right? That sounded very strange, especially since there should be no difference transpiling in-memory or to file. We aren’t even using many bleeding-edge features in the back-end.

So we started removing Babel plugins and seeing that the problem persisted. One by one, we removed all of the plugins and we still saw the dreaded message: “transfer closed with 1 bytes remaining to read.

We sank into deep despair until we realized that I still had the production environment configuration in the build directory, which meant that I was running the API server with a different configuration than the local development environment. So we copied the local environment, restarted the server, and the problem was gone, just like that.

Now, all we had to do was to copy the production configuration, one variable at a time, until we found which option broke it. Finally, after 61 variables, we found it:

SESSION_REDIS_URL=dev-session.xxxx.yyyy.euw1.cache.amazonaws.com

Locally, we store sessions in-memory. Since we are only running one process, there is no need to have a shared session store. But, as soon as we configured a session database using Redis, we started to get broken error responses.

However, that was only half of the story…

Day Three: The Demystifying

We had pinpointed exactly how to reproduce the problem, so it should be an easy fix. We just had to figure out why the session store broke our responses.

At first, I noticed that the express-session option called saveUninitialized caused the issue. So we started putting together a small Express server using the same configuration as our API server, but we couldn’t get the problem to appear reliably.

First, we noticed that if the Redis host was invalid or a working Redis instance, the problem didn’t occur at all. Then, we saw that if the host was valid, but did not host a Redis instance, the HTTP request timed out and our problem appeared.

That did not match what we saw in our app. In our case, the error occurred straight away without any timeout and only on error responses. In the example, the error only occurred as a result of a timeout and on all requests.

At that point I opened a bug ticket on the connect-redis module, asking if anyone had experienced similar behavior. Marc Harter (@wavded) responded saying that he was only able to reproduce the problem with a timeout or by killing the server before the timeout.

That’s when I remembered that we had some custom error handling logic in a module we use internally. We set up a custom error handler for the Express server to return errors as JSON instead of the default HTML representation.

We also filter out almost all of the error details in production for added security — for those, we have logging in place. We added an error handler and boom, responses started breaking just like they did in our API server.

We noticed right away that the error handler looked a bit strange — why would we call next(err) after responding to the client? Removing that unnecessary call solved our problem, finally!

We opened a new issue on the express-session project to try and figure out why this happens and if it could be detected so that a better error message could be displayed.

Douglas Wilson (@dougwilson) from the Express team immediately responded and explained that nothing can be done, because the next proceeds to the default Express error handler, which checks if the response has been started.

If it has because of the resend, it immediately disconnects the socket. This is because we might be streaming a long response to the client and we don’t want to leave the connection open, wasting our resources when an error occurs.

The problem is that the Redis store is still in the process of saving the session and the error handler disconnects the socket before the response can be ended. There we had it, the fault in our responses.

The Takeaway

The lesson of the story is that even the most mysterious bug can be found and fixed. What you need is determination (not wanting to delay the launch), caffeine, and a method to pinpoint and replicate the bug. To summarize:

  1. Replicate the bug in a development environment with all of your tools available.
  2. If you can’t replicate the bug, compare the environment configurations first before thinking about problems in the code or operating system/hardware.
  3. The bug is most likely in your own code and not in third party code. Check your own code first.
  4. A minimal example to reproduce the bug will help tremendously when testing and is very nice when submitting an issue to an open source project.

Ville works as a Lead Developer at Grano, the most versatile content service provider in Finland and the leading graphic industry company in the Nordics, which is hiring. Currently, he is working on Grano’s E-Commerce solutions.

Discover and read more posts from Ville Lahdenvuo
get started
Enjoy this post?

Leave a like and comment for Ville