Section 02
How Datadog RUM works
Understanding Datadog RUM requires understanding a handful of core architectural concepts. These are not Datadog-specific inventions — they are the vocabulary of modern observability engineering — but Datadog's implementation of them is what shapes the product's capabilities and its cost structure.
Event ingestion
The central architectural concept in Datadog RUM is event ingestion. Rather than storing raw session recordings as the primary data model (as a tool like FullStory does), Datadog ingests discrete, structured events — page views, user actions, errors, long tasks, resource loads — each as a JSON record with a defined schema. A page view event carries metadata: URL, viewport, referrer, load timing, LCP, INP, CLS. A user action event carries the element clicked, the latency of any resulting network request, and a trace ID linking it to the backend.
This event-centric model has important implications. It means Datadog can run arbitrary queries over your frontend telemetry — "show me all sessions where LCP exceeded 4 seconds on Chrome Mobile in Germany, sorted by the user's revenue tier" — with the same query engine used across the rest of the Datadog platform. The tradeoff is that the per-session billing model is tied to this event volume: more sessions mean more events mean higher cost.
The RUM agent
Datadog instruments your application by injecting a JavaScript SDK — approximately 25–40KB gzipped — either via an npm package or a CDN snippet. The SDK instruments browser APIs automatically. It uses the PerformanceObserver API to capture Core Web Vitals (LCP, INP, CLS) and navigation timing. It wraps native fetch and XMLHttpRequest to capture network request timing and propagate trace context headers. It hooks into React (or other frameworks') error boundaries to capture unhandled exceptions with source-mapped stack traces. User interactions — clicks, form submissions, route changes in SPAs — are captured as action events.
The SDK batches events locally and flushes them to Datadog's intake endpoints at intervals or on page visibility change, minimising the impact on the user's connection. This is a well-solved engineering problem across the RUM category; Datadog's implementation is mature and handles edge cases like bfcache navigation, service workers, and cross-origin resource timing.
Session sampling and session replay
Datadog RUM's session replay feature can capture a full DOM replay of user sessions — every mouse movement, click, scroll, and DOM mutation — using a MutationObserver-based serialiser. The replay is reconstructed client-side, giving you a pixel-accurate video-like recording of what the user experienced.
Because storing full session replays at 100% of traffic is expensive, Datadog's SDK supports configurable session sampling. You set a sessionReplaySampleRate — typically 5–20% for production traffic — meaning only that fraction of sessions get full replay recording. You can also configure conditional sampling: record 100% of sessions that contain a JavaScript error, but only 5% of sessions overall. This keeps costs manageable while ensuring you capture the sessions most likely to be useful for debugging.
Cardinality
In observability, cardinality is the number of unique values a given tag or dimension can take. Browser name (Chrome, Firefox, Safari, Edge) is low-cardinality — four values. Country is medium-cardinality — around 200 values. User ID, session ID, and URL path for sites with millions of product pages are high-cardinality — potentially billions of unique values. High-cardinality data is expensive to index because the index itself grows proportionally to the number of unique values.
Datadog manages cardinality through its Indexed Spans model. When events are ingested, Datadog makes a retention decision: some events are retained in full detail for querying (indexed); others are summarised into aggregate metrics. You configure retention filters that determine which events to keep. This is a powerful but non-trivial configuration task — getting it wrong means either paying too much (retaining everything) or losing debugging fidelity (retaining too little).
APM correlation
One of Datadog RUM's most distinctive features is its ability to correlate frontend events with backend distributed traces. When the RUM SDK makes an instrumented network request, it injects Datadog's trace propagation headers (W3C traceparent or Datadog's own format) into the HTTP request. If your backend is also instrumented with Datadog APM, the same trace ID flows from browser to web server to microservice to database query. A slow LCP caused by a slow API call surfaces in Datadog RUM as a clickable link to the complete distributed trace — showing every service, every database call, every log line emitted during that request.
Billing model
Datadog RUM charges per session. A session is a 4-hour window of continuous user activity — a user who visits your site, leaves for two hours, and returns starts a new session. For SaaS products where users spend long periods in-app, this window mitigates billing inflation. For high-traffic content sites where users make many short visits, session counts accumulate quickly. At 100,000 sessions per month, costs run approximately $200–$400/month depending on features enabled. At 1 million sessions per month, costs can reach $2,000–$5,000+ before session replay storage. This makes Datadog RUM cost-predictable per user but potentially expensive at scale, especially for consumer-facing sites with high-frequency casual visitors.