Peter Ngigi

Peter Ngigi

Mentor
Rising Codementor
US$0.00
For every 15 mins
ABOUT ME

Seasoned engineer with 5+ years of experience building scalable data-intensive applications, robust data pipelines and engineering production-ready machine learning models.

Nairobi (+03:00)
Joined August 2023
EXPERTISE
7 years experience
5 years experience
4 years experience

REVIEWS FROM CLIENTS

Peter's profile has been carefully vetted and approved as a Codementor. Connect with Peter now, and leave a review for them once you're done!
SOCIAL PRESENCE
GitHub
room-allocator
Office space allocator command line app
Python
0
0
bootcamp18
0
0
EMPLOYMENTS
Senior Software Engineer
Datapeople
2024-02-01-Present

- Refactored large scale ETL pipeline to run end to end in 50% less time by rearchitecting better data models, speeding up API authent...

- Refactored large scale ETL pipeline to run end to end in 50% less time by rearchitecting better data models, speeding up API authentication and parallelization of ingestion processes

Python
MySQL
Docker
View more
Python
MySQL
Docker
AWS (Amazon Web Services)
View more
Senior Data Engineer
Indeed
2022-06-01-2023-06-01
  • Created custom python scripts to automate critical processes such as creation of IAM users, EMR clusters and AWS Glue schemas as ...
  • Created custom python scripts to automate critical processes such as creation of IAM users, EMR clusters and AWS Glue schemas as part of a project to migrate 300 index builders from an on-premises Hadoop cluster to AWS EMR.
  • Coordinated with owners of ETL pipelines as well as internal libraries to modify them for deployment in AWS, as well as iteratively debug bottlenecks to the migration process.
  • Led the engineering for a generalized solution to migrate large artifacts utilized by multiple ETL jobs from HDFS to AWS S3, while guaranteeing no adverse impact to performance or processing time.
Python
Apache Hadoop
Emr
View more
Python
Apache Hadoop
Emr
AWS (Amazon Web Services)
View more
Senior Data Engineer
Automattic
2021-08-01-2022-06-01
  • Successfully migrated over 100 ETL pipelines from an old Hadoop cluster to a new cluster by leveraging a Python library named Ske...
  • Successfully migrated over 100 ETL pipelines from an old Hadoop cluster to a new cluster by leveraging a Python library named Skein and Airflow to submit jobs to Hadoop Yarn.
  • Implemented data serialization to Hive using PyArrow and seamless writes to and reads from HDFS and Hive using Ibis/HDFSCLI.
  • Ensured no adverse impact to model metrics by conducting data comparisons and validation, including row aggregate counts for thousands of Hive tables and machine learning model outputs.
Python
SQL
HDFS
View more
Python
SQL
HDFS
Apache Hadoop
Airflow
View more