How to Choose Between Message DB and Kafka

Message DB and Kafka both live in the world of events and messages. While there’s considerable overlap in the things you can do with them, they’re also very different tools at their extremities. It’s the differences and specializations that inform your choice of using one versus the other.

When you line up the things that are specialized about each tool with the specific requirements of your project, your choice will be clearer. If you focus just on the capabilities that the tools have in common, you’ll still not have much clarity.

Here are some things that both Message DB and Kafka have in common:

They both deal with messages and events
They both serve the needs of microservices and service architectures, as well as data analytics
They both act as message queues and message transports
They both support Pub/Sub
They are both “dumb pipes” in that consumers track their own message offsets
They both support competing consumers

Here are some ways in which they differ:

Kafka is primarily a transient message transfer technology with (usually) temporary storage. Message DB is a primarily a database technology that also serves the needs of message transfer.
Kafka is ideally suited for transient message transfer and queueing. Message DB is ideally suited for permanent message storage, retrieval, and event sourcing, in addition to message transfer and queueing.
Kafka is a message broker. Message DB is a message store.
Kafka is entirely its own technology. Message DB is Postgres.
Kafka organizes messages primarily by topic. Message DB organizes messages primarily by entity streams, and then by category (which is analogous to a topic).
Kafka topics are established statically before runtime, and assigned to specific servers. Message DB streams are established dynamically at runtime by writing a message, with no operations and configuration overhead.
Kafka has a number of distributed, moving parts, including message brokers and a centralized repository of the moving parts. Message DB is just Postgres.
Kafka is entirely distributed and inclined toward availability. Message DB is monolithic and inclined toward consistency.

These distinctions aren’t necessarily black and white, and this list is far from exhaustive. They’re reasonably good guidelines, but they don’t always hold true in all circumstances, in all topologies, and at all scales.

The decisive difference between Kafka and Message DB is the scale that they’re intended to address.

Kafka’s architecture allows super-massive scale through horizontal partitioning, static topology, and quorum replication. It’s because of the scale needs that Kafka has more moving parts and more complications. That complication has a price that can’t reasonably be amortized by the vast majority of applications that can presently benefit from the evented paradigm.

A Kafka system would not be able to be tuned and shaped to address the extraordinary needs of extraordinary scale if it did not expose its individual parts as individually-controllable subsystems. Each part needs to be able to be deployed independently, configured, networked, and registered to the Zookeeper cluster, which is also deployed independently, configured, and networked. There’s significantly more overhead with Kafka.

That said, there are also cloud hosting options for Kafka that minimize the operations overhead of Kafka deployments. And again, this overhead is a necessary side effect of the necessities of Kafka’s use cases, so it shouldn’t be seen as a detriment if you find yourself facing the scenarios that are Kafka’s strengths.

Postgres is also scalable, but likely not as scalable as Kafka at the extreme upper end of scale. It’s more work to scale Postgres horizontally, and it’s more work to configure it for high-availability. But if Postgres (or similar store) is already something that you use in your shop, then chances are it’s already sufficient for your uses as a message store, event store, and message transport.

If you were to run one Kafka node (or a small number of Kafka nodes), then it’s worth considering whether your demands would not be better served with Message DB.

Kafka’s origins are in extreme scale and extreme throughput applications, like the massive input queues at LinkedIn and Twitter, and operations of similar scale. If you take in as many messages as systems like LinkedIn and Twitter, then you’re already at the scale where you already need to be running Kafka, and you already know it. And if that’s the case, you’re probably already running Kafka.

At that scale, messages are only retained for a limited amount of time. There’s just too much throughput to use a message broker to both transfer such a volume of messages and simultaneously permanently store such a volume. In such situations, the message ingress and the message storage are run as independent infrastructures on different technologies. The critical function of such an architecture is to provide a massive enough pipe to capture inbound messages, and hold on to them just long enough for other systems to pick them up, process them, and store permanently before they are disposed of to make room for yet more tsunamis of inbound messages.

Message DB is a Postgres database. It’s used as both a message transport and a message and event store. It’s used in both Pub/Sub scenarios and event sourcing scenarios.

Kafka is a massive scale message transport that isn’t ideally suited to event sourcing. However, Kafka can be used for event sourcing. The Kstreams and Ktables implementations for Java are great examples of this. But no matter how scaled back a Kafka deployment is, it inevitably has higher operational overhead than plain old Postgres, and this overhead may be overkill when running Kafka in a scaled-back configuration for application-specific event sourcing or Pub/Sub at an average scale that can be easily handled by Postgres.

That said, Kstreams and Ktables are brilliant examples of how to deal with event sourcing requirements at Kafka scale, and the trade offs made in the event sourcing patterns to achieve such scale.

Kafka simply may not be a natural a fit for applicative event sourcing and Pub/Sub at ordinary scales, even though it can be made to fit, no more than Postgres is as natural a fit for super-massive transient message transport, even though Postgres and Message DB can be brute-forced to serve such a scenario.

The deciding factor of whether either Message DB of Kafka is a natural fit for your scenario is scale. Message DB is a good choice for ordinary scale, and Kafka is a good choice for extraordinary scale.

The common mistake over the past couple of years, though, is the conflation of event-based with Kafka - especially for ordinary-scale Pub/Sub and event sourcing in microservices and applications. In such contexts, Message DB can be a much more appropriate option.

And it also bears mentioning that if an ordinary-scale implementations begins to lean toward a need for massive scale, transitioning to Kafka later is still an option. The application architectures used in Message DB and Kafka applications are more similar to each other than they are to ORM final state storage typical of web MVC applications. The route from Message DB to Kafka is far more direct than from ORM+MVC to either Message DB or Kafka.

It’s important to note that since Message DB is Postgres, you can add Message DB to your Postgres application, and safely write events to the store in the same atomic database transaction as application data is written. If you happen to already have existing applications built on Postgres and would like to start to integrate eventing and messaging incrementally and progressively, Message DB is an ideal way to start experimenting with events, event sourcing, messaging, and to stage the transition incrementally.

Conclusion

The most important decision to make is whether or not to use Kafka for your eventing and messaging applications. The worst decision you can make is to use Kafka when your use case doesn’t absolutely demand it.

Evented architecture is a broad landscape. Make informed decisions when you choose your tools, as you’ll be feeling the repercussions of these decisions - whether good or bad - for a long time.

If you’re new to evented systems, don’t presume that evented systems means Kafka. As an industry, we’ve been doing evented systems for a long, long time - literally since the dawn of computing. Compared to the history of computing, Kafka is very new. That said, Kafka is also quite amazing and is a great advancement in messaging technology at extreme scale and throughput.

We’ve also been using data stores and databases for a very long time. We understand their characteristics and their limitations, when they’re the right choice and when they’re not.

For the vast majority of applications, especially those that are already well-served by something like Postgres, Message DB is likely the right choice - or at least a good choice and a good place to get started.

If you have any questions, join the Eventide Project’s Slack and talk to people who have been building on Message DB for years and who have built systems on Kafka. The devil’s in the details and we’d be happy to dig in to the details with you.