Phani Kumar Yadavilli

Rising Codementor

US$25.00

For every 15 mins

ABOUT ME

Data Engineer | Mentor | Passionate about learning and teaching.

Results-driven Data Engineer with experience in designing and developing
Terabytes to Petabytes scale Data Platforms/Data Lakes and ETL Frameworks for processing batch and real-time data.

Passion for Distributed Systems and Big Data ecosystem tools and technology stack. Strong acumen in choosing the right tools, technologies for building scalable architectures and platform solutions to process/analyze structured and unstructured datasets, supporting Engineers, Data Analysts, Data Scientists, and many customers. Ability to build production-grade Data Pipelines using the business requirements from scratch.

• 9 years of architecting, designing largely scalable Big Data platforms from Data Ingestion, Data Processing. Experience in building homegrown frameworks catering to Batch Processing and Real-Time/Streaming analytics.

• Built a Streaming Data Analytics platform to process ~500TBs volume of events to process telemetry data on ~80PBs of raw data per day. I designed Cisco's Data-Lake catering to the Cross-Functional teams as well as the external customers of Cisco.

• Instrumental in scaling the Cisco Syslog NG[Next Generation] platform from processing ~13Million events per day to ~55Million events per day. Syslog Next Generation is a highly scalable and high available event-driven and real-time Distributed Data Pipeline designed using Apache Kafka as the message bus and inter-process communication orchestrating different functional services developed using Java and Tomcat-based Spring Boot containers. The data is processed using Spark Streaming jobs.

• Saved 3400$ per quarter to Cisco Systems Inc. by designing, rewriting, and running the Pentaho Data Integration-based ETL pipelines to Microservices-based pipelines at scale.

• Led the Data Engineer efforts to build a scalable Batch Processing platform to process Mobility data for Cisco customers gathering the metrics calculating KPIs to analyze the Quality of Experience.

Pacific Time (US & Canada) (-07:00)

Joined July 2021

EXPERTISE

Data Engineering

9 years experience

Java

9 years experience

I have been using Java since the time I started my career. I have built large-scale distributed systems using Java.

Apache Spark

9 years experience

I have got extensive hands-on experience working with Apache Spark. I understand the internals of Spark and have built data pipelines sca...

I have got extensive hands-on experience working with Apache Spark. I understand the internals of Spark and have built data pipelines scaling the processing logic to process 55 millions of events in a day. I have fine tuned spark data pipelines from scratch. https://phanikumaryadavilli.medium.com/writing-udfs-user-defined-functions-in-apache-spark-4d263577b729 https://phanikumaryadavilli.medium.com/writing-tests-for-your-spark-code-using-funsuite-71a554f92106 https://phanikumaryadavilli.medium.com/avoiding-spark-to-read-and-generate-crc-and-success-files-c52b300c0a77

Apache Kafka

8 years experience

I have used Kafka as a distributed message queue as well as used a stream processing backend. I understand Kafka in depth. I have built m...

I have used Kafka as a distributed message queue as well as used a stream processing backend. I understand Kafka in depth. I have built many microservices using the event sourcing pattern where Kafka was used for Inter-Process Communication. https://phanikumaryadavilli.medium.com/parsing-apache-kafka-consumer-offsets-using-kafka-command-and-java-api-58880a62371d

Docker

5 years experience

I've deployed projects to various container options, including Docker and Tupperware (Facebook's own container format). We used the Docke...

I've deployed projects to various container options, including Docker and Tupperware (Facebook's own container format). We used the Docker format for deploying Spring boot applications and orchestrated using Kubernetes.

NoSQL

8 years experience

Data pipeline design

9 years experience

REVIEWS FROM CLIENTS

Phani's profile has been carefully vetted and approved as a Codementor. Connect with Phani now, and leave a review for them once you're done!

SOCIAL PRESENCE

GitHub

KafkaConsumerOffsetsParser

Kafka ConsumerOffsets topic parser parses the binary data from the consumer offsets topic and converts into JSON strings which can further used to analyze and monitor.

Java

Orchestra

Orchestra is a java based workflow engine which allows to create complex workflows.

Java

Community Posts

Writing tests for your spark code using FunSuite

One of the frequently asked questions in StackOverflow or any other forum by the Data Engineer’s who create their data pipelines using Apache Spark is how to write the test cases. In this write-up,...

Generating Distributed UUID’s using Zookeeper

I have been exploring how to generate Sequence Ids in an environment where the applications are distributed and decoupled. In my case, I had to generate UUIDs in sequential order. As we all know...

Parsing Apache Kafka __consumer_offsets using Kafka command and Java API

_consumeroffsets is the topic where Apache Kafka stores the offsets. Since the time Kafka migrated the offset storage from Zookeeper to avoid scalability problems _consumeroffsets is the one topic...

EMPLOYMENTS

Lead Data Engineer

Cisco Systems Inc.

2019-01-01-Present

At Cisco, I am leading the Data Engineering efforts for the Contact Center Business Unit. • Designed the Data Ingestion Framework scalin...

At Cisco, I am leading the Data Engineering efforts for the Contact Center Business Unit. • Designed the Data Ingestion Framework scaling from 20TBs to 50TBs raw data per day. • Designed the Data Processing Framework to process 100TBs to 5PBs of data per day using Spark as the processing engine and HDFS as the storage. • Integrated the Data Ingestion and Data Processing frameworks using Apache Kafka. • Implemented best tooling and followed best practices for performance tuning. • Implemented a monitoring framework to monitor the Data Pipeline using homegrown frameworks and open source frameworks.

Java

Apache Spark

Apache Kafka

Java

Apache Spark

Apache Kafka

Apache Airflow

Apache NiFi

Apache hbase

PROJECTS

Streaming Data Pipeline to process telemetry data from devices.View Project

CapitalOne

2019

Large scale Streaming analytics data pipeline to process device data.

Java

Apache Spark

Apache Kafka

Java

Apache Spark

Apache Kafka