Kafka - what you need to know...

Published Mar 02, 2018

Kafka being a Journal & Offset based broker system allows for rapid access to relevant data (usually repetitive and templated data sets).

Kafka also has the ability to replay (since the data is fetched from a journaled system) which is not available with traditional messaging systems which are mostly focused on providing transactional semantic (process once and only once) to convey distributed system's state via a decoupled architecture.

Compared to higly-available and highly resilient traditional messaging systems which can face challenges to scale horizontally (albeit with relevant in-depth expertise and configurations), Kafka provides much better performance metrics including scalability, high-availability and replication without having any major downsides.

The only downsides (if you can call it that) is the underlying journal bloat which has to be managed just like any other persistence mechanism like a Database and the support of pub-sub paradigm as the only linkage mechanism between producers and consumers.

But it's high availability metrics (clustering) and ability to ingest vast amount of data from providers and in turn make it available to message consumers (using sharded Topics) is unparalleled in terms of speed, scale and easy configuration. That's one of the main reasons it's very popular with real-time streaming and computations systems typically in industries which rely on sub-second decisions, such as RTB and anamoly detection.

Stream based processing (Kafka's Streams API) on top of exisiting topic based infrastructure, provides a realtime ESB like mechanism which can ingest data streams from Topic(s) and do processing (aggregations, transformations etc) and then dispatch processed data etreams to Topic(s), rather than kafka acting just like a data conduit.

Kafka's Connector API provides a uniform way (adapter) to ingest realtime events into Kafka from a variety of sources including databases and event log systems, which makes adoption of Kafka in an enterprise easy cost-effective and highly attractive to business managers.

Kinesis (AWS) provides very similar facilities and benefits compared to Kafka but is managed & provided as a service on the cloud and scales to similar levels without the need to maintain of Kafka clusters and the cost it entails.

Happy event streaming with Kafka, (to infinite scale and beyond) !!!

Discover and read more posts from Senthil M
get started