Top 3 Kafka Books and Tutorials
The popularity of Apache Kafka is not backing down. If anything, its rate of adoption has been something to behold. To put things visually, here are Google search trends for Kafka globally, over the last six years:
If you haven’t gotten your head around Kafka yet, you probably should. To get you going, I have put together a list of well-respected, proven sources of knowledge on Kafka — both free and paid—that should get you started and well on your way to becoming a Kafka aficionado.
If you want to get properly good at something, you need to invest in training and preparation. Everyone’s experience will vary, but I found I learned the most by reading books that were written by experts.
Effective Kafka: A Hands-On Guide to Building Robust and Scalable Event-Driven Applications
The use of the word “effective” in a title of a book is normally reserved for the most authoritative texts in their respective fields. For example, “Effective C++” by Scott Meyers or “Effective Java” by Josh Bloch. They are the absolute cream of the crop.
So, does the book hold up to the title?
Effective Kafka is the absolute bible on Apache Kafka.
I don’t want to overstate how good this book is, but it would be hard to. To start with, Effective Kafka is not a small book, at nearly 500 pages. But it reads quite well and is intelligently structured, with good grammatical style, a well-thought-out progression and lots of good analogies and illustrations.
The book is a complete guide on the core technology platform and is suitable for all levels of skill — from beginner to intermediate and advanced. It starts by assuming you know nothing and explains the core concepts with lots of diagrams. It then covers off architecture and design considerations, client and broker configuration, operational concerns, security and transactions. The book is a good blend of theory and hands-on practicality, with lots of snippets and examples in Java. I found it to be very up to date, covering off all Kafka features that I knew of and some that I didn’t.
If you are only going to buy one book this year, this is it.
Reasons to buy:
- In-depth coverage of all core concepts and supplementary areas
- Architectural insights and design best-practices
- Suitable for beginner to advanced levels
- Up-to-date coverage of all Kafka features
- Solid value for money and longevity
Reasons to pass:
- None that come to mind
Link to Effective Kafka on the author’s site.
Kafka: The Definitive Guide — Real-Time Data and Stream Processing at Scale
Kafka: The Definitive Guide is one of the pioneering books from none other than the founding maintainers of the platform.
The book is great at explaining the key concepts behind Kafka in a way that is easy to follow and understand. It’s an excellent book for both absolute beginners and intermediate users alike, and is a fantastic complement to the official documentation.
The writing style is simple and concise. The book has a ton of practical examples and technical illustrations.
There are some negatives too. The book is around four years old now and is beginning to show its age. Some of the new features, such as transactions, are not covered. The book sticks to the core topics well but at the expense of coverage of the more advanced topics, such as security.
Reasons to buy:
- Concise delivery and comprehensive coverage of core concepts
- Excellent for beginners and intermediate users
- Good value for money
Reasons to pass:
- Somewhat dated, missing out on new features and tech
- Not suitable for advanced users
Link to Kafka: The Definitive Guide on O’Reilly.
Streaming Architecture: New Designs Using Apache Kafka and MapR Streams
Streaming Architecture is an alternate take on the Apache Kafka literature. Where other books present Kafka is a generic building block, Dunning and Friedman have decided to zero in on the stream processing use case, showcasing not only Apache Kafka but also MapR Streams, which is a complementary technology.
This book is ideal if you have a very specific set of use cases in mind, rather than looking to get into Kafka. Perhaps your work is heavy on stream processing; for example, analytics, complex event processing, windowed aggregation, clickstream analysis, and so on. Streaming Architecture is an excellent opportunity to kill two birds with one stone. (No animals were harmed…)
When considering this book, bear in mind that this is an introductory text. It is not a deep dive by any stretch and you will likely need to buy another book or take a course to fill in the gaps. It doesn’t make it a bad book because of the two-in-one (Kafka and MapR) proposition, but it is a niche book nonetheless, which may not be exactly what you are looking for.
Reasons to buy:
- Focuses on map-reduce style stream processing
- Touches on the negatives of Kafka as well as the positives
- Good use of practical examples to convey the authors’ knowledge
Reasons to pass:
- Light on content (only 120 pages)
- Not suitable beyond a beginner level
- There are better value alternatives
Link to Streaming Architecture on O’Reilly.
While there are some terrific books on Kafka, it doesn’t mean that the free blogs and articles are necessarily bad. Nor does it mean that all books are great—there are certainly a fair number of appalling ones on Amazon.
I’ve included three popular articles on Kafka that should get you started. They are purely introductory, so don’t expect to come out an expert.
Kafka in a Nutshell
Kafka in a Nutshell is exactly what the name suggests. It is a super-concise breakdown of Apache Kafka. What’s more, it’s written by Emil Koutanov—the author of Effective Kafka and the maintainer of Kafdrop.
The article includes a bit of history and some typical use cases and patterns. It then looks at the architecture of the core components —brokers, ZooKeeper nodes, producers and consumers, and how they inter-relate.
For its size, the quality of content and the delivery is excellent. Kafka in a Nutshell gallops through the core concepts, such as topics, partitions, offsets, consumer groups, order of records, parallelism and at-least once delivery. Sure, it does not have the depth of the book, but it gives you a basic understanding.
The article also covers tooling, which is something that is often overlooked. There are examples that use the command-line interface to publish and consume messages, so it is a practical “getting started” guide as much as a theoretical primer.
Kafka in a Nutshell also touches on the drawbacks of Kafka and some of the gotchas that practitioners should be aware of. It also talks about performance and how Kafka achieves its throughput. There is also mention of geographical replication, multi-tenancy and security.
Overall, Kafka in a Nutshell has got to be the best all-rounder—a great way to get your feet wet without spending a penny.
Apache Kafka: Getting Started
Apache Kafka: Getting Started is a highly respected introductory guide to the platform, written by one of the authors of Kafka: The Definitive Guide — Gwen Shapira.
The article does the whole “getting started” thing in a slightly untraditional way — by starting with the examples first, then explaining what they do and how the various components fit into the broader picture. If there is one bit of criticism is that the article lacks a clear structure; Gwen jumps from one concept to another in an attempt to cover ground but the result may sound incohesive at times.
The article does not attempt to be a complete introduction unlike, say, Kafka in a Nutshell. It tries to be more hands-on. It gives you a bit of theoretical knowledge and follows with links to other useful resources. One such resource we will look at next.
The Official Documentation
Why go far looking for blogs and other freebies when you have the Official Kafka Documentation at your disposal?
If you’re a seasoned Kafka developer, you have probably read over the official docs countless times. It is an excellent reference guide. But if you are still finding your bearings, the documentation can be a very valuable resource.
The “Getting Started” section in official docs gives a compelling overview of the platform. It starts with an overview of event streaming and some of its potential use cases. It then presents Kafka in light of event streaming, which is also a good differentiator from message queues.
Having introduced Kafka, the guide then covers some of the main concepts and terminology, such as producers, consumers, topics and partitions.
One advantage you get with the official docs is the assurance that they are correct and thorough. Beyond a short introductory section, the rest of the documentation reads like a user guide, which is basically what it was designed for.
Kafka is a highly flexible event streaming platform that brings about its own set of concepts and principles. It is not the easiest piece of technology to conquer but the effort will be worth your while. Being a fairly mature platform, it is not short on documentation and other sources of knowledge, but there is also a lot of fluff out there that isn’t helpful for someone just starting out. I hope this list has given you some pointers in the right direction.