Codementor Events

How and why I built Automate data ingestion system

Published Feb 10, 2021Last updated Feb 17, 2021
How and why I built Automate data ingestion system

About me

Data engineer with 10 years of experience interpreting and analyzing data in order to drive successful business solutions. Expertise in data modeling, database performance tuning, and ETL pipelines dataflow development using Python ecosystem and SQL language.

The problem I wanted to solve

The company used to massively work using .xlxs files. The difficulty was that constantly the users lost track of files update and many reworks were performed. Thus I designed a data warehouse and ETL dataflow in order to consolidate the data to a centric database.

What is Automate data ingestion system?

A data warehouse that provides up-to-date information to the users so they must be able to develop their data analysis tasks.

Tech stack

A data warehouse and an ETL pipeline data flow using Python ecosystem and SQL. language

The process of building Automate data ingestion system

First, I started with the data source mapping step. Second, I did some interviews with stakeholders and key-users, in order to grasp how the business operations were done. After reaching the understanding I designed a dimensional model, mapping the core entities of the data flows within the business activities. Ultimately, I've developed and tested the ETL pipeline, which displayed a good performance process, and I did the deployment of the data warehouse database.

Challenges I faced

The human factor is by far the most challenging you must face in order to gather the appropriate requirements to during the dimensional model design. Add to that, the fact that developing an ETL system is quite challenge and a time-consuming task, because you must pay attention to the details in order to build it properly.

Key learnings

I've learned a lot about the business process and how to better communicate with stakeholders, and users in order to create a favorable scenario for them to dissert about their daily activities. What I would do it in a different manner is the choice of the users, because the most experienced is not the manager but the operational worker that routinely perform their jobs.

Tips and advice

Yes, try to focus on the business domain, once you fully grasp how the business process works, you will be able to design effective solutions using technology stacks.

Final thoughts and next steps

This is an ongoing project because what I did was the fundamental part of the whole. So I continuously must scale the analytics scenario in order to meet the user's need.

Discover and read more posts from Jayron Soares
get started
post commentsBe the first to share your opinion
Show more replies