Rollups & downsampling
Rollups let you define persistent downsampling policies that continuously materialise pre-aggregated copies of your raw metrics. Instead of aggregating millions of raw points on every query, queries that match an active policy read the pre-computed buckets directly — with a live tail computed on-the-fly for any data that has not yet been materialised.Contents
- Concepts
- Defining a policy
- Applying policies at runtime
- Triggering and scheduling
- Query-time substitution
- Observability
- Crash safety and durability
- Invalidation
- HTTP API reference
- Rust embedded API
- Python bindings
- Constraints and requirements
Concepts
Policies
A rollup policy describes how one source metric should be downsampled:| Field | Type | Description |
|---|---|---|
id | string | Unique identifier for the policy. Used as a persistent key — renaming an id creates a new policy and discards the old materialisation. |
metric | string | Source metric name to downsample. |
matchLabels | Label[] | Optional label filter. The policy applies only to source series that carry all of these labels. An empty list matches every series for the metric. |
interval | i64 | Bucket width in the same units as your timestamp precision (milliseconds, nanoseconds, etc.). Must be greater than zero. |
aggregation | Aggregation | Aggregation function applied within each bucket. Must not be None. |
bucketOrigin | i64 | Alignment origin for bucket boundaries. Bucket edges are computed as origin + N × interval. Defaults to 0 for power-of-two-aligned intervals. |
Bucket alignment
Bucket boundaries are computed with Euclidean floor division:Synthetic metric storage
Materialised data is stored internally under synthetic metric names that are never visible through the metadata APIs (list_metrics, label enumeration, etc.):
Checkpoints
The engine tracks a checkpoint for every(policy, source series) pair — the timestamp through which that series has been fully materialised. On each worker run, only data between the checkpoint and the current stable boundary is processed, making runs incremental.
The stable boundary is the latest bucket start before the most-recently-observed timestamp. The in-progress bucket is deliberately excluded so that partial-bucket data is never committed to the materialised output.
Defining a policy
JSON representation
Available aggregations
| Value | Description |
|---|---|
Sum | Sum of all raw values in the bucket |
Avg | Arithmetic mean |
Min | Minimum value |
Max | Maximum value |
Count | Number of raw points |
Last | Last value by timestamp |
None is not permitted in a rollup policy.
Applying policies at runtime
Policies are managed as an atomic set — eachapply_rollup_policies call replaces the entire active policy set. There is no add or remove operation; submit the complete desired set every time.
When a new set is applied:
- Every submitted policy is validated and normalised.
- The engine acquires the run lock and waits for any in-flight worker to finish.
- Policies whose definitions are unchanged retain their existing checkpoints and materialised data.
- A modified policy (any field change) receives a new generation, clearing its checkpoint so it rematerialises from scratch.
- Policies that were removed have their checkpoints and pending state discarded; the synthetic materialised series are no longer queried. The underlying stored data is garbage-collected during subsequent compaction.
- The new policy set and updated state are atomically persisted to disk.
- A synchronous materialization pass runs immediately before the call returns.
- The current
RollupObservabilitySnapshotis returned.
Triggering and scheduling
Background worker
A background thread namedtsink-rollups wakes every 5 seconds and runs a full materialization pass across all active policies. The worker is also unparked immediately after:
- Every committed write batch (to keep materialisation lag low).
- Every committed tombstone (delete operation).
- An explicit
trigger_rollup_runcall.
Forced run
Calltrigger_rollup_run (or POST /api/v1/admin/rollups/run) to block until a full pass completes and return the resulting snapshot. Useful after bulk imports or in CI.
Query-time substitution
When aselect call requests downsampling (downsample option with an interval and aggregation), the engine checks whether a rollup candidate can satisfy the request before falling back to on-the-fly downsampling.
A candidate is accepted when all of the following hold:
policy.intervalequals the requested interval.policy.aggregationequals the requested aggregation.- The query’s
starttimestamp is exactly aligned to a bucket boundary ((start − bucketOrigin) % interval == 0). - The policy matches the requested metric and all requested labels (
matchLabelsis a subset of the series’ label set). - A checkpoint exists for the source series and
materializedThrough > start. - No pending delete invalidation overlaps the query window.
matchLabels), then the one with the greatest materializedThrough, and finally the lexicographically smallest id as a tiebreak.
Partial coverage
If the query window extends beyondmaterializedThrough, the engine:
- Reads the materialised buckets for
[start, materializedThrough]. - Reads raw points for
[materializedThrough, end]and downsamples them on-the-fly using the same bucket alignment origin as the matching policy. - Merges the two result sets and deduplicates by timestamp.
Observability
Every policy-management call returns aRollupObservabilitySnapshot:
lag is None until the policy has processed at least one series. A lag of zero means every committed point is covered by the materialisation. Lag grows when the worker has not yet processed recent writes (typically less than 5 seconds under normal operation).
A rollups field with the same shape is included in the engine’s observability_snapshot() output, which is served at /metrics in Prometheus format as part of the server’s self-instrumentation.
Crash safety and durability
Rollup state is persisted in two files under<data_path>/.rollups/:
| File | Contents |
|---|---|
policies.json | Active policy set |
state.json | Checkpoints, generation counters, pending materializations, pending delete invalidations |
rename, fsync of the parent directory).
Pending materializations
Before writing any materialised rows to the storage engine, the engine records a pending materialization entry instate.json containing:
- The checkpoint advance it is about to make.
- The current generation for the policy.
- If the materialised rows are already present (checkpoint already advanced ahead of the pending range), the entry is dropped as a no-op.
- If the stored checkpoint matches the pending entry’s generation and checkpoint, the materialization window is retried without writing duplicate buckets.
- If no checkpoint exists for the series (crash before any rows were written), the entry is dropped and the policy re-materialises from scratch.
Pending delete invalidations
Before a tombstone is committed, the engine records a pending delete invalidation listing the affected series IDs and policy IDs. After the tombstone becomes visible, the invalidation finalises: it bumps the generation and clears the checkpoint for the affected policies. If a crash occurs between staging and finalisation, the invalidation is re-applied automatically on startup.Recovery sequence
On startup, after WAL replay:policies.jsonandstate.jsonare loaded.- Stale state for removed policies is discarded.
- Generation counters and checkpoints are reconciled.
- Any
pending_delete_invalidationswhose tombstones are already committed are immediately finalised. - The background worker thread is started.
Invalidation
Materialised data is invalidated (checkpoint cleared, generation bumped) whenever the source data changes in a way that would corrupt pre-computed buckets:Out-of-order writes
If a new point arrives with a timestamp earlier than the existing checkpoint for its source series, the affected policies are invalidated. The next worker run rematerialises from scratch under a new generation. Writes with timestamps at or above the checkpoint do not trigger invalidation — they extend the materialization window on the next run.Deletes
A range delete on a source series invalidates any policy whose materialised data overlaps the deleted time range. The invalidation is staged durably before the tombstone commits, then finalised after it does. The rollup worker is immediately unparked to begin rebuilding.HTTP API reference
POST /api/v1/admin/rollups/apply
Atomically replace the active policy set with the submitted list. Runs a synchronous materialization pass before returning.
Request body: JSON array of policy objects.
RollupObservabilitySnapshot (JSON).
Notes:
- Pass an empty array
[]to remove all policies. - Duplicate
idvalues in the submitted list are rejected. - Policy
metricmust not begin with__tsink_rollup__:.
POST /api/v1/admin/rollups/run
Trigger an immediate, synchronous materialization pass. Blocks until the pass completes.
Request body: empty.
Response: RollupObservabilitySnapshot (JSON).
GET /api/v1/admin/rollups/status
Return the current RollupObservabilitySnapshot without running a materialization pass.
Response: RollupObservabilitySnapshot (JSON).
Rust embedded API
Python bindings
Constraints and requirements
| Constraint | Details |
|---|---|
Requires data_path | Rollups are not available for in-memory-only storage instances. apply_rollup_policies returns TsinkError::InvalidConfiguration if no data path was configured. |
Non-empty id | Policy identifiers must be non-empty strings. |
Positive interval | interval must be greater than zero. |
Aggregation must not be None | The None aggregation is rejected during policy validation. |
| No internal metric names | The source metric must not begin with __tsink_rollup__:. |
| Atomic set semantics | apply_rollup_policies replaces the entire policy set. To add a policy, submit the existing policies plus the new one. |
| Bucket-aligned query start | Query-time substitution only activates when the query’s start is exactly aligned to a bucket boundary relative to bucketOrigin. Mis-aligned queries fall back to on-the-fly downsampling. |
| Generation rebuilds on modification | Any field change to an existing policy triggers a full rematerialisation. Modifying interval or aggregation of a high-cardinality policy temporarily increases storage usage until old-generation data is compacted away. |