How and why I built Event data ingestion on AWS using SQS/Kinesis/DynamoDB

Published Jan 23, 2018

About me

Hands-on-Architect / Architect/design/code cloud based software systems and integrations

The problem I wanted to solve

Huge volume of events needed to be ingested in a short span of time and actionable intelligence gleaned from it.

Approaches used prior did not scale and did not meet objectives to fulfill actionable intelligence targets

What is Event data ingestion on AWS using SQS/Kinesis/DynamoDB ?

A staged event driven pipeline using the cloud(AWS). High volume of events generated by upstream systems converted to analytics output that is provided as a dashboard for business folks to make decisions on.

It involved various stages of processing events and certain NLP techniques to glean knowledge from events and then applying analytics patterns to builds actionable intelligence

Tech stack

AWS(SQS/Kinesis/DynamoDB/Redshift/ECS/Lambda).

Compared to the cost of instrumenting the same set of technologies in a data-center, AWS provides flexibility and scale with minimal up-front expenditure.

The process of building Event data ingestion on AWS using SQS/Kinesis/DynamoDB

The process started with analyzing critical requirements and aligning to a set of possible architectures and technologies to fulfill those requirements. AWS was chosen as the primary IaaS provider because of reasons of cost & scale

Each stage of the processing pipeline was built in isolation so that it enabled testing (with mocks) and slowly integrated to complete a multistage pipeline, each stage acting as a micro-service with it;'s own datasource.

All of the pipeline was built using Java and instrumentation was done using Python/
Terraform/ Ansible / Jenkins / Maven.

Challenges I faced

Trying to get a grip on the partition aspects of DynamoDB was a bit challenging when data is being written at a very rapid rate, followed by reads at a higher scale. Various indexing and caching mechanisms alleviated those issues.

Trying to balance cost VS functionality was another challenge since any resource used on the cloud costs money and has to be taken into consideration when designing/coding/testing & deploying software onto the cloud.

Key learnings

The learnings were that AWS is a pretty easy platform to code for and scale rapidly. But it comes with some downsides as mentioned above.

There were other AWS services which could have been used to deliver certain functionality coded as part of the pipeline, but there were risks because of maturity of those services.

Tips and advice

My advice would be to always build testable components which can be developed/tested/deployed in isolation and then combined with other components to build systems in whole.

Final thoughts and next steps

The system built has met its objectives. The next steps are for integration with services outside of AWS and that requires complete set of new services and resources to support inter-cloud use.

Amazon s3 Amazon web service Aws dynamodb Amazon ec2 Apache kafka

Report

Enjoy this post? Give Senthil M a like if it's helpful.

Senthil M

AWS/Go/Python/Java dev

Hands-on Software Architect with cloud (AWS/GCP/Azure) & full stack expertise. Specialties: Go/Python/Java based services, Cloud based Software, Micro-services, Software and system architecture & design, Agile practices, opencv, ...

Discover and read more posts from Senthil M

get started