Monday, December 2, 2024

Zio-kafka, faster than java-kafka

TlDR: Concurrency and pre-fetching gives zio-kafka a higher consumer throughput than the default java Kafka client for most workloads.

Zio-kafka is an asynchronous Kafka client based on the ZIO async runtime. It allows you to consume from Kafka with a ZStream per partition. ZStreams are streams on steroids; they have all the stream operators known from e.g. RX-Java and Monix, and then some. ZStreams are supported by ZIO's super efficient runtime.

So the claim is that zio-kafka consumes faster than the regular Java client. But zio-kafka wraps the default java Kafka client. So how can it be faster? The trick is that with zio-kafka, processing (meaning your code) runs in parallel with the code that pulls records from the Kafka broker.

Obviously, there is some overhead in distributing the received records to the streams that need to process it. So the question then is, when is the extra overhead of zio-kafka less than the gain by parallel processing? Can we estimate this? In turns out we can! For this estimate we use the benchmarks that zio-kafka runs from GitHub. In particular, we look at these 2 benchmarks with runs from 2024-11-30:

  • the throughput benchmark, uses zio-kafka and takes around 592 ms
  • the kafkaClients benchmark, uses Kafka java client directly and takes around 544 ms

Both benchmarks:

  • run using JMH on a 4 core github runner with 16GB RAM
  • consume 50k records of ~512 bytes from 6 partitions
  • process the records by counting the number of incoming records, batch by batch
  • are using the same consumer settings, in particular max.poll.records is set to 1000
  • the broker runs in the same JVM as the consumer, so there is almost no networking overhead
  • do not commit offsets (but note that committing does not change the outcome of this article)

First we calculate the overhead of java-kafka. For this we assume that counting the number of records takes no time at all. This is reasonable, a count operation is just a few CPU instructions, nothing compared to fetching something over the network, even if it is on the same machine. Therefore, in the figure, the 'processing' rectangle collapses to a thick vertical line.


Polling and processing as blocks of execution on a thread/fiber.

We also assume that every poll returns the same amount of records, and takes the same amount of time. This is not how it works in practise, but when the records are already available we're probably not that far off. As the program consumes 50k records, and each poll returns a batch of 1k records, there are 50 polls. Therefore, the overhead of java-kafka is 544/50 ≈ 11 ms per poll.

Now we can calculate the overhead of zio-kakfa. Again we assume that processing takes no time at all, so that every millisecond that the zio-kafka benchmark takes longer can be attributed to zio-kafka's overhead. The zio-kakfa benchmark runs longer for 592-544 = 48 ms. Therefore zio-kafka's overhead is 48/50 ≈ 1ms per poll.

Now lets look at a more realistic scenario where processing does take time.


Polling and processing as blocks of execution on a thread/fiber.

As you can see a java-kafka program alternates between polling and processing, while the zio-kafka program processes the records in parallel distributed over 6 streams (1 stream per partition, each stream runs independently on its own ZIO fiber). In the figure we assume that the work is evenly distributed, but unless the load is really totally skewed, it won't matter that much in practice due to zio-kafka's per-partition pre-fetching. As you can see the zio-kafka program in this scenario is much faster. But is this a realistic example? When do zio-kafka programs become faster than java-kafka programs?

Observe that how long polling takes is not important for this question because polling time is the same for both programs. So we will only look at the time between polls. We use (p) for the processing time per batch for the java-kafka program. For the zio-kafka program time between polls is equal to zio-kafka's overhead (o) plus processing time per batch divided by the number of partitions (n). So we want to know for which p:

o + p n p

This solves to:

p o n n - 1

For the benchmark we fill in the zio-kakfa overhead (o = 1 ms) and the number of partitions (n = 6) and we get: p ≥ 1.2 ms. (Remember, this is processing time per batch of 1000 records.)

In the previous paragraph we assumed that processing is IO intensive. When the processing is compute intensive, ZIO can not actually run more fibers in parallel than the number of cores available. In our example we have 4 cores, using n = 4 gives p ≥ 1.3 ms which is still a very low tipping point.

Conclusion

For IO loads, or with enough cores available, even quite low processing times per batch, makes zio-kafka consume faster than the plain java Kafka library. The plain java consumer is only faster for trivial processing tasks like counting.

If you'd like to know what this looks like in practice, you can take a look at this real-world example application: Kafka Big-Query Express. This is a zio-kafka application recently open sourced by Adevinta. Here is the Kafka consumer code. (Disclaimer: I designed and partly implemented this application.)