Kafka Message Key Hashing

 In Kafka each event message contains an optional key and a value.

  • key == null : messages are distributed evenly across partitions in a topic (a round-robin strategy).
  • key != null : All messages that share same key will always be sent and stored in the same partition.

** A key can be anything to identify a message (a string, numeric value, binary value etc.).

Key Hashing is the process of determining the mapping of a key to a partition. A Kafka partitioner is a code logic that takes a record and determines to which partition to send it into.

** In the default Kafka partitioner, the keys are hashed using murmur2 algorithm (murmur hash).

targetPartition = Math.abs(Utils.murmur2(keyBytes) % (numPartitions - 1))

Example Flow:

  • Message 1: account_id = 12345 hash(12345) % 2 might result in 0 -> Partition 0
  • Message 2: account_id = 67890 hash(67890) % 2 might result in 1 -> Partition 1
  • Message 3: account_id = 12345 (same as Message 1)hash(12345) % 2 still results in 0 -> Partition 0 again, ensuring that all messages with account_id = 12345 are processed in the same order.

Benefits of Keyed Partitioning:

  • Message Ordering: By sending all messages with the same key to the same partition, Kafka ensures that they are processed in the order they were sent.
  • Data Locality: Grouping related messages into the same partition can also improve performance by ensuring that they are processed together.

Yorumlar

Bu blogdaki popüler yayınlar

JUnit 5 Parameterized Tests

Transactional Outbox Pattern