Codementor Events

How I learned Apache Kafka

Published Sep 30, 2018Last updated Oct 15, 2018

About me

I am a data engineer, experienced in building scalable applications with big data technologies, and interested software architecture, problem solving, and clean code.

Why I wanted to learn Apache Kafka

Motivation, as part of my job in building a highly scalable and resilient software in a high traffic system (handling up to 50K msg/sec), faced lots of scalability, stability problems and logical problems (request order) when we went with the normal way of using REST API as the way of communication between the data pipeline components, We had to explore other ways; event bus was the way to go, and Kafka was the best choice to be the event bus and the communication medium in the data pipeline.

Another big thing was to replace spark -was used incorrectly in the system- with kafka-streams, which is a really powerful solution on top of Kafka.

How I approached learning Apache Kafka

In order to start with learning a new technology, I highly recommend to start with the official docs -specially if new tech- as the official docs will contains as much info as you need to have a good start, I agree not all official docs are straightforward or have clear info, but in a way or another you will find the info you are looking for, and not the blogs or quick how-to will help you when facing a real production issue.

After having a fair knowledge with the technology and started to be part of our stack, you will start see issues and unexpected behavior, in that case I always follow the below approach:

1- Official docs to review configs, or any pointers that I missed the first time I read the docs.

2- Open issues on github or Jira.

3- StackOverflow (funny note, I have asked 4 questions on StackOverFlow for kafka and spark and it ended up I answered them all 😄 ).

4- Watching videos on Confluent Kafka channel was really helpful.

Challenges I faced

The challenges were that kafka releases were super fast the past year, and I had to keep up to date with the release notes and issues closed.

Key takeaways

It was a really enjoyable trip walking through the deepest details in kafka and getting exposed to a wide range of problems starting from developing consumers and producers till deploying a HA cluster

Tips and advice

In order to learn kafka, you have to be aware with the event-bus architecture, how topics and queues work, also you need to have a look on the other alternatives to kafka (Rabbitmq, ZeroMq) just to know the difference in design.

Final thoughts and next steps

Next learning goal it trying to get more in depth knowledge with kafka eco-system, as Kafka not only a message queueing framework, it provides a full blown stack of data pipeline processing/ ETL solutions (Kafka-streaming, Kafka-connect, KSQL -by confluent-).

Discover and read more posts from Karim Tawfik
get started
post commentsBe the first to share your opinion
Show more replies