Breaking the Latency Barrier: Dynamic Segmentation at Millisecond Speed

Breaking the Latency Barrier: Dynamic Segmentation at Millisecond Speed

Real-time user segmentation is critical for delivering responsive, personalised experiences, but traditional SQL systems fall short at scale. This article explores a high-performance architecture using Redis, DynamoDB, horizontal data modelling with bitfields, and lightweight machine learning to achieve sub-50ms segment lookups. Backed by real-world implementation, it offers practical strategies for engineering teams building fast, scalable personalisation engines.

Still using SQL for user segmentation? That’s your bottleneck.

When your users expect personalisation to respond in real time, relying on vertical SQL tables, indexes, and joins becomes an obstacle. Traditional systems weren’t designed to respond to live behaviour changes with millisecond-level performance.

This article outlines a scalable, production-tested architecture for real-time segmentation using distributed key-value stores like Redis and DynamoDB. Combined with lightweight machine learning inference and horizontal data modelling, this approach delivers user segment updates and lookups in under 50 milliseconds, even at high scale.

Where SQL Falls Behind

Let’s start with what a typical segmentation schema looks like in a relational database:

sql
user_id | segment_id
--------------------
user-123 | seg-123
user-123 | seg-345
user-123 | seg-678

To retrieve segments for a user, the system queries:

sql
SELECT segment_id FROM user_segments WHERE user_id = 'user-123';

This works fine for small datasets. However, as your application grows, it faces challenges with millions of users, multiple segments, and an increase in concurrent requests, leading to a significant performance cost.

Even with indexing, relational databases struggle with read latency when segment membership is highly dynamic and accessed frequently. And that’s exactly the use case for personalisation features like targeted offers, recommendations, or pricing adjustments.

These systems need to know: what segment does this user belong to right now?

A Simpler, Faster Model: Horizontal Segmentation

Instead of modelling segmentation vertically, we store it horizontally. Each user gets one record, with segment flags as columns:

sql
user_id    | seg-123 | seg-345 | seg-678
-----------------------------------------
user-123   |   1     |   1     |   0

Now, looking up a user’s segments becomes a single key-based operation.

Distributed key-value stores like Redis or DynamoDB handle this model naturally. In Redis, you can store this as a hash. In DynamoDB, an item has dynamic attributes. Storing data using bitfields instead of full hashes further improves memory efficiency, especially at scale.

In production, this pattern allows you to retrieve an entire segment map for a user in one call, without joins, filters, or intermediate queries.

Industry Validation

Source: Eran Stiller

This isn’t just theoretical. Netflix, for example, uses Redis in its custom queuing system “Timestone” to maintain fast, deterministic access under high throughput. While their use cases differ, they reflect the scalability and speed of key-value-based architectures for real-world applications.

SQL vs. Horizontal KV: What You Gain

Here’s a side-by-side comparison to illustrate how the model shift improves system behaviour:

FeatureSQL Vertical ModelHorizontal KV Model
Lookup TimeGrows with table sizeConstant (single key read)
Write OverheadOne insert per segmentOne update per user
Real-Time SuitabilityLimitedExcellent
Schema FlexibilityRigidHigh (supports dynamic fields)
Cost at ScaleHigh indexing and I/OPredictable, compact memory use

This difference becomes visible as soon as your user base grows beyond a few hundred thousand active profiles, especially if each session triggers personalisation.

How Real-Time Segmentation Actually Works

Structure alone won’t keep segment data up to date. You need to assign segments dynamically, based on user behaviour. This is where machine learning comes in.

Here’s the high-level flow:

  • A user event occurs (e.g., login, click, purchase).
  • The system extracts features relevant to segmentation, including device, geography, and action history.
  • A fast machine learning model runs inference, typically LightGBM or logistic regression.
  • The resulting segment flags are updated in the Redis cache for real-time access and DynamoDB for persistence.

Because this is triggered by user behaviour, it reflects their most current status. There’s no need for nightly jobs or cron-based refresh cycles.

This architecture is beneficial when segment definitions are flexible based on a mix of behavioural and contextual rules that are hard to predefine with SQL logic alone.

Architectural View: The Components in Action

A working implementation involves these key components:

  • Redis is used for sub-millisecond reads by downstream services like pricing, recommendations, or UI rendering.
  • DynamoDB provides durability, audit trails, and serves as a fallback when Redis misses.
  • Updates are streamed to both stores simultaneously to ensure consistency.

Optimisation Techniques That Make a Difference

Real-time performance relies on innovative engineering. Here are a few practical methods we use to keep things running fast and stable:

1. Expire Short-Lived Segments

Not all segments need to live indefinitely. States like “active in the last 10 minutes” or “cart engagement this session” can expire naturally. Redis supports TTLs on keys and fields to:

  • Reduce memory use
  • Avoid stale segment impact
  • Remove the need for manual cleanup

This keeps Redis lean and accurate.

2. Use Redis Cluster for Scale

Single-node Redis works until it doesn’t. Redis Cluster allows horizontal scaling by automatically sharding data across nodes.

With clustering, you get:

  • Higher throughput
  • Fault isolation
  • Predictable performance across high-concurrency reads.

Sharding by hashed user ID ensures balance across the system.

3. Store Segment Flags as Bitmaps

When tracking dozens or hundreds of binary segments, use Redis bitfields instead of full hashes or JSON blobs. They’re compact and efficient:

  • Minimal memory footprint
  • Fast, atomic reads and updates
  • Easy to retrieve and scan flags in bulk

This is a practical option for large-scale deployments with fixed segment lists.

4. Distribute Keys Evenly

Avoid key hotspots by namespacing or hashing keys. For example:

  • Use prefixes like seg:user-123
  • Apply consistent hashing to user IDs

Even key distribution helps prevent uneven traffic spikes and node pressure.

5. Run Inference Only When It Matters

Inference should be fast and event-driven. We use models like LightGBM because they’re optimised for tabular data and return predictions in under 10ms.

Only trigger inference when meaningful events occur (e.g., login, cart action), not for every minor interaction. This reduces load without affecting accuracy.

6. Maintain Dual Writes for Durability

While Redis is your primary lookup store, all writes should also go to DynamoDB for long-term consistency.

Use either:

  • A write-through pattern (direct to both), or
  • An event stream with write-ahead logs (e.g., Kafka)

That way, Redis stays fast, and DynamoDB ensures recovery and reliability if anything goes wrong

Proven in the Field: What We’ve Seen

At Ibotta, we use this architecture to manage real-time segmentation across more than 100 million users. These segments influence everything from in-app offers to email triggers.

Segment lookups are consistently delivered in under 25ms at the 95th percentile, even during high-traffic periods. And because inference is only triggered on behaviour, the system remains cost-effective and focused.

Other companies use similar ideas. LinkedIn, for instance, developed Apache Pinot to serve real-time analytics for features like “Who Viewed My Profile.” Pinot enables sub-second aggregation across massive datasets, different from segmentation, but rooted in the same principle: serving fast, dynamic results from distributed, pre-modelled structures.

Why This Matters Now

User behaviour shifts in milliseconds. Systems that can’t respond quickly fall behind not because they lack data, but because they can’t process and act on it fast enough. Whether you’re powering recommendations, pricing, targeting, or personalisation, segmentation needs to be live, context-aware, and highly efficient.

The approach outlined here is built from field experience, solving real bottlenecks in production environments with tens of millions of users. Horizontal data models, KV stores, and lightweight ML inference aren’t just architectural preferences. They’re practical choices that consistently deliver faster performance, lower operational friction, and more reliable personalisation.

This model supports the pace of modern applications. It’s fast enough for user expectations, simple enough for teams to manage, and flexible enough to evolve with new products.

Speed matters. So does having the right system behind it.

References

  1. Stiller, E. (2022): Netflix builds a custom high-throughput priority queue backed by Redis, Kafka, and Elasticsearch. Eran Stiller. https://eranstiller.com/news/netflix-builds-a-custom-high-throughput-priority-queue-backed-by-redis-kafka-and-elasticsearch
  2. Datacouncil.ai. (2025): Building real-time analytics applications using Apache Pinot: A case study of LinkedIn. https://www.datacouncil.ai/talks25/building-real-time-analytics-applications-using-apache-pinot-a-case-study-of-linkedin
Sheshank Kodam is a Staff Platform Engineer with over 12 years of experience building scalable, data-intensive systems. At Ibotta, he leads real-time targeting infrastructure serving over 100 million users. His work spans distributed storage, machine learning pipelines, and cloud-native architectures optimised for speed and reliability.
Total
0
Shares
Related Posts