Faisal Malik

Rising Codementor

US$10.00

For every 15 mins

ABOUT ME

Data Engineer

Data Engineer with experience in various startups and currently at one of the biggest consulting firms. As my experiences on startups, I gain technical skills to build end-to-end data pipelines from scratch. Ingesting various sources, processing them, loading them to data destination, optimize the storage-compute decoupling, modeling data destination and implement data governance on the business users. These skills allow me to convert messy raw data into data-driven actionable insights. Other than that, I also developed my soft skills especially during my current position at McKinsey, I learned a lot about consulting, problem solving, team leading and many more.

Indonesian, English

Singapore (+08:00)

Joined November 2022

EXPERTISE

7 years experience

7 years experience

6 years experience

6 years experience

3 years experience

6 years experience

5 years experience

REVIEWS FROM CLIENTS

Faisal's profile has been carefully vetted and approved as a Codementor. Connect with Faisal now, and leave a review for them once you're done!

SOCIAL PRESENCE

GitHub

data-driven-growth

Data Science and Machine Learning Implementation in Python to gather some insights for company growth

Jupyter Notebook

Paillier-Linear-ML

Implementation of Paillier's Homomorphic Encryption to the Linear Machine Learning Model.

Python

EMPLOYMENTS

Senior AI Engineer

US Biolab

2024-02-01-Present

Design an end-to-end machine learning flow to perform semantic segmentation on biospecimens from data labeling to model serving i...

Design an end-to-end machine learning flow to perform semantic segmentation on biospecimens from data labeling to model serving in the AWS ecosystem
Provision serverless infrastructure on AWS using Lambda, SQS, DynamoDB, S3, ECS, and Fargate to enable tile-level model inference parallelization.
Automate experimentations, training, and hyperparameter tuning process using GitHub Actions dispatch workflow that provisions on-demand runners on EC2, which execute the job.
Implement various image transformations to improve segmentation results.

Python

TypeScript

Rust

Python

TypeScript

Rust

PyTorch

Semantic segmentation

AWS

Bunjs

Senior MLOps Engineer

Walleye Capital

2024-11-01-2025-11-01

Migrated all async calls implementation to the OpenAI API to the OpenAl Batch API in the Kubeflow Pipeline, which reduced OpenAI ...

Migrated all async calls implementation to the OpenAI API to the OpenAl Batch API in the Kubeflow Pipeline, which reduced OpenAI token consumption costs by 50% and eliminated all parallel invocation instances from Kubeflow components.
Standardize the pipeline using multiple reusable Kubeflow components, allowing all team members to productionize their Pipelines seamlessly.
Set up CI/CD Pipeline to test and deploy from GitHub to the GCP ecosystem.
Set up GitHub Actions Workflow Dispatch to submit and schedule Vertex Al Pipelines.
Provisioned and managed GCP infrastructure using lac tools like Pulumi.
Refactored experimental codes from data scientists to meet production standards and deployment.
Integrated different services and components built by other team members so the system can run smoothly and efficiently.

Python

SQL

Google BigQuery

Python

SQL

Google BigQuery

OpenAI

Vertexai

RAG

Senior Data Engineer

Hivello

2024-07-01-2025-10-01

All-in-One Decentralized Physical Infrastructure (DePIN) manager.

Design and implement a Data Ecosystem in GCP from the gro...

All-in-One Decentralized Physical Infrastructure (DePIN) manager.

Design and implement a Data Ecosystem in GCP from the ground up using Terraform and GitHub Actions.
Design and implement log centralization from end-user applications to Cloud Logging.
The logs are routed hourly to Cloud Storage and transformed upon arrival using Cloud Run Jobs with an adjusted instance size based on the log size.
The transformed logs are stored in BigQuery for further analytics, which utilize a Materialized View to optimize Aggregate retrieval.
This strategy enables flexible analytics to accommodate new use cases without requiring changes to the log source.
Develop a Blockchain indexing framework using a subgraph to index multiple DePINs' on-chain earnings data.
This allows the data to be accessed using GraphQL and analytics purposes.
Enable self-serve analytics throughout the company by provisioning Metabase and connecting BigQuery as a data source with business-friendly data models.
This makes the company data-driven even without a data analyst to translate their questions into SQL queries.
Visualize core business metrics in Grafana with optimized performance and views.

Python

Logging

Google BigQuery

Python

Logging

Google BigQuery

Google Cloud Platform

Blockchain

Grafana