Real-Time vs Near-Real-Time - Why Milliseconds Matter in Fraud Prevention

"Near-real-time" is one of the most frequently misused terms in fraud technology marketing. Vendors describe systems with 2-second response latencies as near-real-time. Some describe 500-millisecond systems as real-time. The language is vague because the latency numbers are inconvenient — and because most buyers don't understand why those numbers matter operationally.

They do matter. The difference between a 50-millisecond scoring system and a 2-second system isn't just a speed preference. It's a fundamental difference in what each system can actually do to prevent fraud.

The authorization window

When a card transaction is initiated, it moves through a chain: merchant payment system to acquiring bank to card network to issuing bank. The issuing bank has a window to respond with an authorization decision before the network times out the request. That window is typically 500 to 2,000 milliseconds for standard card transactions, depending on the network and the transaction type.

If your fraud scoring system can't return a result inside that window, you have two options: make the authorization decision without the fraud score, or hold the transaction until the score arrives (which increases the probability of a timeout and a failed authorization). Most systems choose the first option, which means your fraud detection is running after-the-fact — scoring transactions that have already been authorized.

Post-authorization scoring can still be useful for chargeback management, pattern detection, and model training. But it cannot stop fraud at the authorization stage. If that's your primary detection layer, you're not preventing fraud — you're documenting it.

Where 2 seconds fails

A 2-second response latency sounds fast in human terms. In payment infrastructure terms, it's frequently fatal for pre-authorization fraud detection. Here's why:

Card networks specify maximum response times. Visa's authorization timeout is typically around 1,000 milliseconds. Mastercard's is similar. These limits exist because payment experiences need to be fast — a checkout that hangs for several seconds converts at measurably lower rates and creates customer service escalations.

More importantly, modern payment flows involve multiple enrichment steps during the authorization path: 3DS authentication, velocity checks, address verification, device fingerprinting. Each step takes time. If your fraud scoring system consumes 2 seconds, you've already exceeded the total budget for the entire authorization chain.

For A2A (account-to-account) transfers on instant payment rails, the window is even tighter. RTP and FedNow have strict latency requirements for participating institutions. A detection system that can't operate within those bounds either creates compliance issues or gets bypassed.

What runs inside 50 milliseconds

The architectural challenge of sub-100ms fraud scoring is that you can't do slow things. Database joins across large tables at query time — too slow. Complex rule evaluations with many conditions — potentially too slow at scale. Anything requiring network round-trips to external enrichment services in the critical path — probably too slow.

What works at sub-50ms: pre-computed feature vectors updated asynchronously and served from in-memory stores; lightweight model inference running on compiled, optimized model artifacts; fast lookup-based checks against pre-indexed data structures; and carefully bounded rule evaluation against pre-assembled context.

Building this correctly requires separating the real-time scoring path (fast, pre-computed, lightweight) from the asynchronous enrichment path (comprehensive, updated continuously, feeding into real-time feature stores). These are different engineering problems with different infrastructure requirements. Most fraud platforms were built before this architectural discipline was well understood, and many of them still have the enrichment logic running in the authorization path where it degrades latency.

Latency under load

Published latency numbers are usually median or average figures. The number that matters operationally is P99 — the latency at the 99th percentile of requests. For a high-volume payment processor, a system that averages 50ms but has a P99 of 800ms means that roughly 1 in every 100 transactions is running close to the authorization timeout limit.

Load spikes — the kind that happen during flash sales, holiday peaks, or coordinated fraud attacks — can push P99 significantly higher than steady-state performance suggests. A system that performs well at average load may degrade substantially at 3x peak volume. This is where horizontal scaling architecture matters: the system needs to add capacity faster than transaction volume spikes.

When evaluating a fraud detection platform, ask specifically about P99 latency under peak load conditions, not average latency under test conditions. The gap between those numbers is often significant and tells you a lot about the platform's architecture.

Near-real-time use cases

To be fair, some fraud detection workflows don't require pre-authorization latency. Post-authorization batch analysis, daily risk reporting, model training pipelines, and chargeback investigation all operate on timescales where 2-second or even multi-second processing is perfectly adequate. "Near-real-time" as a category is legitimate for these workflows.

The problem is when near-real-time systems are marketed and deployed as pre-authorization fraud controls, then either miss the authorization window entirely or create customer-facing latency problems. This is common. We've reviewed deployments where a near-real-time scoring system was positioned in the authorization path and was transparently bypassed by the payment gateway whenever it exceeded the timeout threshold — meaning it was providing no effective fraud control at all, while still appearing in the compliance documentation as a detection layer.

The customer experience dimension

Authorization latency is also a conversion factor. Research from payment optimization studies shows that checkout conversion drops measurably for each additional second of processing time. A fraud system that adds 1.5 seconds to every transaction isn't just slow — it's costing revenue independently of its fraud detection impact.

The engineering discipline of fast fraud scoring isn't just about fraud prevention. It's also about not degrading the payment experience for the 99.2% of customers who are legitimate. The best-designed fraud systems are nearly invisible to good customers — operating in background, below the perceptual threshold of any delay. That invisibility is a product requirement, not just an engineering nicety.

Detectiv's scoring pipeline runs at sub-50ms P99 under standard production load conditions. That number was an explicit design target, not an afterthought. Building a payment fraud system that doesn't fit inside the authorization window is building a fraud monitoring system — which is a different product solving a different problem.