Exactly-Once Event Processing E2E: Bridging Apache Flink and Kafka for Reliable Data Streams

Day 1 | 12:30 | 00:30 | UB5.132 | Adi Polak


Note: I'm reworking this at the moment, some things won't work.

The stream isn't available yet! Check back at 12:30.

Achieving exactly-once semantics is a cornerstone of reliable event streaming systems, but the challenge magnifies when ensuring guarantees across the entire pipeline—from data ingestion in Apache Kafka to stateful processing in Apache Flink, and back to Kafka or another sink. In this talk, we’ll explore the intricacies of designing an end-to-end system that maintains data integrity and correctness without compromising on scalability.

We’ll dive into: * Kafka’s 2 phase commit transactional guarantees and how they align with Flink’s checkpointing mechanisms. * Flink 2 phase commit E2E protocol * Practical strategies to address challenges like fault tolerance, recovery, and latency trade-offs.

This session will explore the implementation of Flink and Kafka 2 phase commit(2PC), the magic files, coordinating it across two distributed systems and the challenges you will face in implementing exactly once event processing E2E in your systems.