How and why I built AWS Hybrid Data Ecosystem
About me
I am a Data Engineer with a passion for analytics and building advanced streaming analytics applications
The problem I wanted to solve
Allow data to be ingested from on prem to cloud, then allow third parties to consume from the cloud
What is AWS Hybrid Data Ecosystem?
End to end solution to allow ingestion and consumption
Tech stack
Spring Boot, AWS S3, Kafka, AWS Athena
The process of building AWS Hybrid Data Ecosystem
Visulaized an idea, leveraged existing components, implemented it at scale
Challenges I faced
Security and Authentication
Key learnings
Best practices of when to be on prem and when to be in the cloud
Tips and advice
Utilize the power of AWS SDK
Final thoughts and next steps
Cloud storage can solve 90% of problems with developing a datalake