Real-Time Data Pipelines: Why Every AI System Needs One

Most systems described as "real-time" are not. They are micro-batch systems that refresh every 30 seconds or every few minutes, which is a meaningful distinction when the cost and complexity of building true streaming infrastructure is significant. Before committing to a streaming architecture, the most important question is: what latency does your use case actually require? The answer to that question should drive every subsequent architectural decision.

Batch, Micro-Batch, and Streaming: What Each Actually Means

Batch processing collects data over a period of time and processes it as a group. A nightly ETL job that summarizes the previous day's transactions is batch processing. It is simple, cheap, reliable, and entirely appropriate for many use cases. The output is stale by the time it is ready, but if you are building a weekly executive dashboard, staleness is not your problem.

Micro-batch processing runs the same pattern on a much shorter cycle. Apache Spark's Structured Streaming, for example, operates in micro-batches: it collects events over a configurable window (as short as a few hundred milliseconds) and processes them as a mini-batch. Most systems marketed as real-time dashboards are micro-batch under the hood. The latency is measured in seconds, not milliseconds.

True streaming processes each event as it arrives. Apache Kafka paired with Apache Flink is the canonical production stack for this. Latency is measured in tens of milliseconds end-to-end. This is what you need for fraud detection. It is not what you need for a dashboard that refreshes every 30 seconds.

When Real-Time Actually Matters

The use cases that genuinely require sub-second processing are narrower than most teams assume:

Fraud detection. A payment authorization decision must happen in milliseconds, before the transaction completes. If your fraud model runs in batch overnight, it identifies fraud after the money has moved. This is a case where latency has direct financial consequences.
Dispatch optimization. When a driver completes a delivery or a vehicle goes offline, routing decisions need to update within seconds. A ten-minute lag in a dispatch system causes cascading delays across the entire operation.
Real-time personalization at the session level.If a user is actively navigating your product and their actions should influence what they see next within the same session, sub-second latency matters. If you are personalizing tomorrow's email campaign, batch is fine.
Operational alerting. Systems that need to trigger an action when a threshold is breached (a temperature sensor, an error rate spike, a trading limit) need streaming. A dashboard that a human checks periodically can tolerate minutes of latency.

Operational dashboards that people review periodically are almost always fine with five to fifteen minutes of latency. Building a streaming pipeline to power a dashboard that gets checked three times a day is engineering waste.

The Streaming Stack

For teams building real streaming infrastructure on our data and AI platforms practice, the core components are well-established. Kafka is the message transport layer. It provides a durable, ordered, distributed log. Producers write events to topics; consumers read from those topics at their own pace. Kafka decouples producers from consumers, which means your fraud detection system does not need to know anything about the payment system that generates the events.

Apache Flink is the processing engine of choice for stateful stream processing. It can compute running totals, join streams, detect patterns across events, and manage complex windowing logic. Flink's exactly-once semantics are production-grade. Spark Streaming is a viable alternative for teams that already operate Spark infrastructure and can tolerate micro-batch latencies.

ksqlDB provides a SQL interface over Kafka streams. For teams that need stream transformations but do not want to write Java or Scala Flink jobs, ksqlDB lets analysts write SQL queries that run continuously over a stream. It is not as powerful as Flink for complex stateful operations, but it is significantly easier to operate.

Schema registries sit alongside Kafka to manage event schemas. When a producer changes the structure of an event, the schema registry enforces compatibility rules to prevent downstream consumers from breaking. Confluent Schema Registry is the standard. Without a schema registry, schema drift becomes a production incident waiting to happen.

Event-Driven Architecture and Why Events Are the Right Primitive

An event is a record that something happened: a payment was initiated, a package was delivered, a user clicked a button. Events are immutable facts about the past. Unlike database rows that get updated in place, events accumulate in a log. This is the natural shape of reality: things happen in sequence.

Event-driven systems are loosely coupled because producers do not call consumers directly. A payment service emits an event; a fraud service, an audit service, and a notification service all consume it independently. Adding a new consumer does not require modifying the producer. This decoupling is one of the core engineering advantages of event-driven architectures, independent of any latency requirements.

Exactly-Once Semantics: What It Is and What It Costs

In a distributed system, failures happen: a node crashes, a network partition occurs, a consumer restarts. When this happens, events might be processed zero times (lost), once (ideal), or more than once (duplicated). Exactly-once semantics is the guarantee that each event is processed exactly once, even in the presence of failures.

Kafka achieves exactly-once through idempotent producers (which deduplicate retries at the broker level) and transactional APIs (which allow atomic writes across multiple partitions). Flink's checkpointing mechanism, combined with Kafka's transactional APIs, extends exactly-once to stateful stream processing.

The cost is real. Transactional producers have lower throughput than non-transactional ones. Checkpointing introduces latency. The operational complexity of debugging exactly-once failures is significant. For many use cases, at-least-once processing with idempotent downstream consumers is the right trade-off.

Why Streaming Is Operationally Harder Than Batch

Batch pipelines fail in predictable ways: the job fails or it succeeds. Streaming pipelines fail in strange, subtle ways that are harder to detect and diagnose.

Ordering guarantees are non-trivial. Kafka guarantees ordering within a partition, but not across partitions. If events from the same user are spread across multiple partitions, processing them in order requires explicit handling.

Late data is a constant challenge. In a streaming system, events can arrive out of order due to network delays. Windowing logic (compute the sum of all events in the last five minutes) must decide what to do when an event from four minutes ago arrives two minutes late. Watermarks define the threshold beyond which late data is ignored. Setting the right watermark requires understanding your data's latency profile.

Checkpointing is the mechanism by which streaming systems recover from failures without reprocessing all historical data. But checkpoints consume storage, introduce latency, and add complexity to the operational model. Getting checkpointing right takes time.

Cloud-Managed vs Self-Managed Kafka

Managing Kafka yourself means managing ZooKeeper (or KRaft in newer versions), broker sizing, topic replication, disk capacity, and upgrades. For most teams, this is not the best use of engineering time.

Confluent Cloud provides fully managed Kafka with a consumption-based pricing model. Amazon MSK manages the Kafka brokers while you retain control of configuration. Google Pub/Sub is a different model entirely (push-based, not pull-based) but serves similar use cases and integrates naturally with the Google Cloud data stack.

Self-managed Kafka is justified when you need cost control at very high throughput (managed services become expensive at scale), when your compliance requirements prevent using cloud-managed services, or when you need configuration control that managed services do not expose.

The Data Freshness SLA: Design to the Requirement

The right way to approach streaming architecture is to start with a precise latency requirement. Not "real-time" but "fraud decisions within 150 milliseconds of transaction initiation" or "dispatch re-optimization within 10 seconds of a route change event." Precise requirements let you design the simplest system that meets them.

A system with a 10-second SLA can use micro-batch Spark Streaming. A system with a 500-millisecond SLA needs Flink. A system with a 15-second SLA might use a simple Kafka consumer with no framework at all. Over-engineering for a tighter SLA than you need wastes money in compute and engineering time.

Streaming Data for AI and Real-Time Inference

Streaming infrastructure becomes directly relevant to agentic AI systems when those systems need to act on current state rather than historical state. Online feature stores (Feast, Tecton, Hopsworks) bridge the streaming and ML worlds: they ingest events from Kafka, compute features in real time, and serve those features to models at inference time with sub-100-millisecond latency.

Without an online feature store, teams often fall back to precomputing features in batch and serving stale values. For many ML use cases this is acceptable. For fraud detection or real-time pricing, it is not.

The integration between streaming infrastructure and ML inference is one of the harder architectural problems in production AI. For a deeper look at what the full stack looks like, see our guide on how to build an AI-ready data platform.

The summary: if your AI system's decisions need to reflect what happened in the last minute rather than the last day, you need streaming infrastructure. If last day's data is sufficient, build something simpler and invest the saved engineering capacity elsewhere.