Thales Gibbon

Rising Codementor

US$15.00

For every 15 mins

ABOUT ME

Data Engineer | Cloud | 3x GCP Certified

Senior Data Engineer with 8+ years of experience building scalable data platforms and pipelines in both batch and real-time environments. Specialized in distributed data processing, cloud architecture, and machine learning infrastructure. Proven expertise in designing resilient systems using Apache Spark, Apache Beam, Airflow, and Terraform on GCP and AWS. Highly skilled in Python and SQL, with hands-on experience integrating data products with Data Warehouses, Data lakes and ML/AI models. Effective communicator with cross-functional teams, including data analysts, data scientists, DevOps, and business stakeholders.

Brasilia (-03:00)

Joined April 2024

EXPERTISE

Python

8 years experience

SQL

8 years experience

Google Cloud Platform

3 years experience

PostgreSQL

4 years experience

Web Scraping

1 year experience

Cloud Data Warehouse

8 years experience

REVIEWS FROM CLIENTS

Thales's profile has been carefully vetted and approved as a Codementor. Connect with Thales now, and leave a review for them once you're done!

SOCIAL PRESENCE

GitHub

prj-hack-delorean-12345

Dart

dronepylot

Python

EMPLOYMENTS

Senior Data Engineer

Lumenalta

2025-05-01-Present

Principal Data Engineer

PicPay

2023-01-01-2025-04-01

• Led the development of a company-wide data platform, defining architecture, roadmap, and release priorities based on both business and ...

• Led the development of a company-wide data platform, defining architecture, roadmap, and release priorities based on both business and technical needs. • Served as the technical lead for multiple areas, including Data Ingestion, Machine Learning, Data Platform, Data Governance, and Analytics Engineering, supporting a 60-person cross-functional team. • Designed and implemented scalable real-time pipelines using Debezium, Kafka, and Apache Spark Streaming on Kubernetes, enabling low-latency decision-making. Standardized observability using Prometheus and Grafana. • Built a unified ingestion framework in Python applying object-oriented programming principles, reducing code duplication and operational incidents, while enabling ingestion from multiple data sources into the Data Lake. • Designed a solution on AWS for archiving, purging, and serving historical data, exposing services via Kubernetes to backend microservices, reducing MTTR and promoting data reusability across the company. • Collaborated with ML and Data Engineering teams to identify overlapping services and consolidated them into a single fault-tolerant platform, reducing operational costs and increasing system synergy. Stack: AWS, Databricks, MySQL, MongoDB, Apache Spark, Airflow, Debezium, Kafka, Spark Streaming, Terraform, CI/CD, Kubernetes, SageMaker, Python, Prometheus, Grafana

MySQL

MongoDB

AWS EMR

MySQL

MongoDB

AWS EMR

Data Lake

Aws athena

Databricks

Debezium

Staff Data Engineer

Banco Original

2019-07-01-2022-12-01

• Led the development of data pipelines focused on CRM, Customer Life Cycle, and Growth initiatives, reporting directly to the Executive ...

• Led the development of data pipelines focused on CRM, Customer Life Cycle, and Growth initiatives, reporting directly to the Executive Director. • Acted as the technical lead for a 10-person squad composed of data engineers, ML engineers, and data scientists. • Built a high-throughput ETL architecture using Python and Apache Spark on Hadoop to create the company’s Data Lake and integrate it with the CRM platform for campaign data refreshes. • Negotiated with cross-functional tech teams and led the configuration of the company’s first cloud provider. Migrated all data pipelines to Google Cloud using serverless and scalable services, reducing pipeline processing time by 80%. • Designed and deployed the company’s first real-time ML model for payment default prediction based on behavioural data from mobile interactions. The model was hosted on Vertex AI, with a BigQuery feature store and real-time integration with loan product backend systems. Stack: GCP, BigQuery, Oracle, Dataflow, Apache Beam, Hadoop, Hive, AWS, Data Lake, Terraform, Python

ETL

Google pubsub

Apache Beam

ETL

Google pubsub

Apache Beam

Google Cloud Functions

Google Dataflow

Vertexai