The monitoring says 95ms. A customer emails to say the app feels slow. You check the dashboard. Everything looks fine.
Both can be true at the same time.
What the average hides
p50 latency — the median — tells you what the middle of your distribution looks like. If half your requests complete in under 95ms, your p50 is 95ms. That sounds fast. What it doesn't tell you is what the other half looks like. Or the top 1%.
At 50,000 API requests per day, one percent is 500 requests. If those 500 requests take 4 seconds — a timeout, a cache miss on cold infrastructure, a query hitting a table lock — your dashboard is green while 500 real users per day are sitting in front of a spinner.
p99 latency tells you what's happening at the edge of your distribution. It's the threshold that 99% of requests fall below. If your p99 is 3.2 seconds, 1 in 100 users is experiencing something completely different from what your average suggests.
Why slow tails cause churn, not complaints
Most users won't file a support ticket. They'll close the tab.
A slow experience during a critical action — checkout, file upload, search — just ends the session. The signal that shows up first is rarely a performance alert. It's a small drop in conversion rate. A slight uptick in session abandonment. A support ticket that says "your app is slow sometimes." That word "sometimes" is p99.
The relationship between latency and bounce rate is well-documented. The specific numbers vary by product, but the direction doesn't: slow tail experiences drive abandonment at a higher rate than average latency predicts. Users don't experience averages. They experience individual requests.
Synthetic monitoring versus what users experience
Most teams set up synthetic monitoring early — a scheduled check that hits a URL every few minutes from a data center and verifies the response lands under a threshold. That's a reasonable floor check. It catches crashes and major regressions.
What it doesn't catch: the real experience of a user on a mid-range Android device on 4G in a different continent. That user isn't hitting your data center in Virginia on a clean 10ms connection. They're hitting a CDN edge node that may or may not have the right cache state. Their network round-trip adds 80–120ms before your server touches the request. Their device parses and renders JavaScript on hardware that's two CPU generations behind the laptop you run tests on.
Synthetic tests tell you the product isn't broken. They don't tell you it's fast for the users who are actually using it.
Real-user monitoring captures what actually happens: every request, every page load, from actual devices on actual networks, measured at the browser. The gap between your synthetic p50 and your RUM p95 is the gap between "works in QA" and "feels fast in production."
A concrete case
We built a document processing tool for a client. Synthetic monitoring was excellent — the upload endpoint responded in 340ms on average. The client's sales team started hearing that uploads felt slow. We pulled in real-user data.
The median was fine. The p95 was 2.8 seconds. Filtering by device type: mobile users on Android were consistently hitting 4–6 second upload times.
The cause was two things sitting in the request handler that should have been elsewhere. A synchronous image resize step ran inline instead of being pushed to a background job. A file validation step loaded the entire file into memory before checking the header. Neither issue showed up in synthetic tests, which ran from a fast connection with small test files. Real users were uploading 8–12MB PDFs over 4G.
Both fixes were small. The upload handler became async, the validation became streaming. p95 on mobile dropped to under 700ms. The sales feedback stopped.
The code wasn't bad. The monitoring wasn't showing us what mobile users were experiencing.
What to actually measure
For any production system handling meaningful traffic, you need three things:
Percentile latency by endpoint. p50, p95, and p99, broken out per route — not aggregated across the whole service. An API that's slow on one endpoint will look fine if you're only watching the fleet average.
Real-user metrics from the browser. Server response time is one input. Time to interactive, largest contentful paint, and first input delay capture the full user experience including JavaScript execution, rendering, and network transfer. These are what the user actually waits for.
Segmentation by geography and device type. Even rough segmentation — mobile vs. desktop, US vs. international — exposes the gaps that averages hide. A 200ms server response time becomes a 900ms user experience for someone in a high-latency region on a slower network.
The third one is where most teams fall short. "Average latency by endpoint" creates the illusion of coverage while the 4G user in a distant region experiences something you've never measured.
The question to ask a software studio
Ask any engineering team you're evaluating: what does your production performance monitoring look like, and when did it last prompt a change?
If the answer is "we have dashboards," ask which percentiles they track. Ask whether the dashboards have ever triggered an alert. Ask whether they've instrumented real-user metrics or only server-side timing.
A team that has only ever watched p50 has optimized for the median while the tail keeps churning. Performance work that matters happens at the edge of the distribution — because that's where the users you're losing are having their experience shaped.