Principal Data Engineer
PicPay
2023-01-01-2025-04-01
• Led the development of a company-wide data platform, defining architecture, roadmap, and release priorities based on both business and ...
• Led the development of a company-wide data platform, defining architecture, roadmap, and release priorities based on both business and technical needs.
• Served as the technical lead for multiple areas, including Data Ingestion, Machine Learning, Data Platform, Data Governance, and Analytics Engineering, supporting a 60-person cross-functional team.
• Designed and implemented scalable real-time pipelines using Debezium, Kafka, and Apache Spark Streaming on Kubernetes, enabling low-latency decision-making. Standardized observability using Prometheus and Grafana.
• Built a unified ingestion framework in Python applying object-oriented programming principles, reducing code duplication and operational incidents, while enabling ingestion from multiple data sources into the Data Lake.
• Designed a solution on AWS for archiving, purging, and serving historical data, exposing services via Kubernetes to backend microservices, reducing MTTR and promoting data reusability across the company.
• Collaborated with ML and Data Engineering teams to identify overlapping services and consolidated them into a single fault-tolerant platform, reducing operational costs and increasing system synergy.
Stack: AWS, Databricks, MySQL, MongoDB, Apache Spark, Airflow, Debezium, Kafka, Spark Streaming, Terraform, CI/CD, Kubernetes, SageMaker, Python, Prometheus, Grafana
MySQL
MongoDB
AWS EMR
View more
MySQL
MongoDB
AWS EMR
Data Lake
Aws athena
Databricks
Debezium
View more
Staff Data Engineer
Banco Original
2019-07-01-2022-12-01
• Led the development of data pipelines focused on CRM, Customer Life Cycle, and Growth initiatives, reporting directly to the Executive ...
• Led the development of data pipelines focused on CRM, Customer Life Cycle, and Growth initiatives, reporting directly to the Executive Director.
• Acted as the technical lead for a 10-person squad composed of data engineers, ML engineers, and data scientists.
• Built a high-throughput ETL architecture using Python and Apache Spark on Hadoop to create the company’s Data Lake and integrate it with the CRM platform for campaign data refreshes.
• Negotiated with cross-functional tech teams and led the configuration of the company’s first cloud provider. Migrated all data pipelines to Google Cloud using serverless and scalable services, reducing pipeline processing time by 80%.
• Designed and deployed the company’s first real-time ML model for payment default prediction based on behavioural data from mobile interactions. The model was hosted on Vertex AI, with a BigQuery feature store and real-time integration with loan product backend systems.
Stack: GCP, BigQuery, Oracle, Dataflow, Apache Beam, Hadoop, Hive, AWS, Data Lake, Terraform, Python
ETL
Google pubsub
Apache Beam
View more
ETL
Google pubsub
Apache Beam
Google Cloud Functions
Google Dataflow
Vertexai
View more