Konstantin Shilovsky

Building Notification Service with AWS Lambda

Published Dec 08, 2017Last updated Jun 05, 2018

Cloud services are designed to make our lives easier by taking away the complexity of infrastructure management. At least, they promise to do so.

By trying to satisfy the maximum possible number of clients of different sizes and with various business models, the official documentation and guides sometimes lack simple use cases that may be sufficient for small system components.

One of the most popular AWS services is S3. Its application within an organization may vary from simple config storage to a data lake holding tremendous amounts of information.

S3 is strongly integrated with other AWS services and actually allows you to send notifications to SNS, SQS, or Lambda when a change happens within an S3 bucket.

Why would you need any additional notification mechanism if you simply want to know when a new file is added to the bucket?

It appears to be that the setup of the notification channel between S3 and the consumer services is not that “simple.” For example, your Rails application can use SNS as notification channel, but your Jenkins server will only work through SQS. Also, consumer services may have different requirements for notifications.

From the three services mentioned above, Lambda seems like a perfect customizable proxy between S3 and your system. In this post, I’ll show how easy it is to build a configurable notification service with AWS Lambda.

The Consumer Service

Let’s assume we have an analytics service that makes some calculations and stores the results in S3. The results then should be imported into the back-end application database so that they can be presented to users.

Since we would like to avoid any scheduling issues, the goal is to make the whole system more reactive. For this reason, we would like to inform back-end when new results are ready for consumption.

Writing a Function

Assuming we have a bucket where the new results will appear, let’s add a lambda function that will process the incoming events.

S3 Events have a pretty complex structure, but luckily, AWS Lambda provides a test event that will help us in development and testing. I will use Python for the function code, but of course, JavaScript and Java are supported as well.

The code for the Lambda function can be divided into 3 modules:

event_parser.py — extracts the path to the file from an S3 event.
notifier.py — sends a notification to the consumer about the new or modified file.
main.py — the main controller of the function that calls the other two modules.

event_parser consists of just one method, which extracts the path to a file added to S3. Even though an event sent by S3 will contain a list of ‘records’ (events), I decided to keep the module simple and allowed accepting only one event. That is why you will see the iteration of the event list in main.py later.

def extract_s3_path(event):
    key = event['s3']['object']['key']
    bucket = event['s3']['bucket']['name']

    return f"s3://{bucket}/{key}"

I will skip the code for tests here, but you can find them in the repo.

notifier.py doesn’t know of the consumer itself. It simply sends a request built from the consumer's parameters.

import requests
import logging

logger = logging.getLogger()

def notify(consumer, s3_path):
    request = _build_request(consumer, s3_path)

    logger.info(f"Recepient: {request['url']}\nBody: {request['body']}")

    requests.post(request['url'], json=request['body'], auth=request['auth'])

def _build_request(consumer, s3_path):
    return {
        'url': consumer['url'],
        'auth': (consumer['username'], consumer['password']),
        'body': {'s3_path': s3_path}
    }

main.py collects the config for the consumer and calls notifier with an s3 path parsed by event_parser.

import os
import notifier
import event_parser

def handle(s3_event, context):
    consumer = _get_consumer()

    for event in s3_event['Records']:
        s3_path = event_parser.extract_s3_path(event)
        logger.info(f"Extracted path {s3_path}")

        notifier.notify(consumer, s3_path)

def _get_consumer():
    return {
        'url': os.getenv('CONSUMER_URL'),
        'username': os.getenv('CONSUMER_USERNAME'),
        'password': os.getenv('CONSUMER_PASSWORD')
    }

Note that I’ve used an environmental variable to specify the endpoint. It adds a little bit of flexibility. For example, you can use RequestBin to check all of the headers and body contents in development mode and use the real endpoint in production.

Obviously, the credentials are also specified via the environment variables. In addition, I’ve inserted some log output to follow the function execution in CloudWatch. The link to the logs can be found directly in the AWS Lambda console of the function.

Configuring the Function

As I mentioned earlier, I have added automated tests to make sure the functionality is covered. Besides, you will definitely run the function via the Lambda test event to see that everything is working as expected.

If you are not yet familiar with AWS, you can easily add your first function by following this tutorial.

Below are some screenshots demonstrating how environment variables and test event should be configured.

Environment Variables

Test event configuration

Adding Function Triggers

In order for the function to be executed, we need to add triggers to the S3 events. AWS Lambda provides a seamless way of doing this.
Simply go to the Triggers section of your function’s dashboard and follow the step-by-step instructions.

This is how it looked for me at the last step:

S3 trigger setup

It is worth mentioning that that the S3 Multipart Event is crucial if you plan to upload large files. My experience shows that files > 10MB are considered large by AWS.

Now you can verify the function is executed by adding or updating a file in the S3 bucket specified above. You can either use some test URL or CloudWatch logs to check the results of the execution.

Conclusion

The above implementation is not sufficient to cover all of the use cases for your notification system. For more complex event processing logic, you would most likely prefer SQS.

Yet this service serves the single purpose and has several advantages:

Uses REST as a unified interface for notifications
Eliminates complexity of SNS and SQS setup on the consumer side
Has a simple authentication mechanism that is extensible
Allows parsing and preprocessing S3 events

To make the service production ready, you would need to add a bit more flexibility. In the next post, I will demonstrate how the function can be extended in order to support multiple notification types and consumers.

You can find the complete project here.

Amazon s3 AWS Lambda Python Serverless

Report

Enjoy this post? Give Konstantin Shilovsky a like if it's helpful.

Konstantin Shilovsky

Discover and read more posts from Konstantin Shilovsky

get started