Tiered storage
tsink supports automatic hot → warm → cold tiered storage backed by an object store (or any locally mounted volume). Segments are moved between tiers by the post-flush maintenance pipeline based on configurable age windows, and reads are automatically routed to the correct tier at query time.Overview
Without tiered storage, all persisted segments live on the local data volume. Tiered storage extends that with a second volume — the object-store root — that holds three subdirectories:| Tier | Location | Data age |
|---|---|---|
| Hot | {object_store_root}/hot/ | Within hot_retention_window of the ingestion frontier |
| Warm | {object_store_root}/warm/ | Older than hot_retention_window, within warm_retention_window |
| Cold | {object_store_root}/cold/ | Older than warm_retention_window, within the global retention window |
data_path and no warm/cold movement ever occurs.
Enabling tiered storage
Rust StorageBuilder
with_tiered_retention_policy implicitly enables retention enforcement.
Server binary
Configuration reference
StorageBuilder methods
| Method | Default | Description |
|---|---|---|
with_object_store_path(path) | None (no tiering) | Sets the root path for warm/cold segment storage. Typically a path on an object-store-backed volume separate from data_path. Setting this enables tiering. |
with_tiered_retention_policy(hot, warm) | — | Sets hot and warm cutoff windows and enables retention enforcement. |
with_retention(duration) | 14 days | Global data expiry. Also used as the fallback value for unconfigured tier windows. |
with_runtime_mode(mode) | ReadWrite | ComputeOnly for query-only nodes — see Compute-only mode. |
with_remote_segment_refresh_interval(duration) | ~5 s | How often the segment catalog is re-read from the object store. |
with_mirror_hot_segments_to_object_store(bool) | false | Copy hot segments into {object_store_root}/hot/ as they are flushed — see Hot segment mirroring. |
with_remote_segment_cache_policy(policy) | MetadataOnly | Controls remote chunk prefetching. MetadataOnly prefetches chunk index metadata only; payload bytes are mmap’d on demand. |
Server CLI flags
| Flag | Default | Description |
|---|---|---|
--object-store-path PATH | unset | Object-store root — enables tiering |
--hot-tier-retention DURATION | falls back to --retention | Age cutoff for hot→warm migration |
--warm-tier-retention DURATION | falls back to --retention | Age cutoff for warm→cold migration |
--retention DURATION | 14d | Global expiry |
--storage-mode MODE | read-write | read-write or compute-only |
--remote-segment-refresh-interval DURATION | ~5 s | Catalog refresh TTL |
--mirror-hot-segments-to-object-store BOOL | false | Mirror hot segments on flush |
s, m, h, d (e.g. 7d, 48h).
Directory layout
When tiered storage is configured, the object-store root adopts this layout:seg-<id> directory contains the segment’s data files and a manifest.json. The segment catalog at the root provides a fast, authoritative index of all segments and their tiers without walking the full directory tree.
Tier lifecycle
Ingestion and flush
New data is always written to the local write buffer and WAL. When the flush pipeline seals a memory chunk, it writes a new persisted segment to the local hot storage. Ifmirror_hot_segments_to_object_store is enabled, an additional copy is placed under {object_store_root}/hot/.
Post-flush maintenance
After every flush, the maintenance pipeline computes aRetentionTierPolicy using:
retention_cutoff— timestamps older than this are expired.hot_cutoff—now − hot_retention_window; segments withmax_ts < hot_cutoffmove to warm.warm_cutoff—now − warm_retention_window; segments withmax_ts < warm_cutoffmove to cold.
| Condition | Action |
|---|---|
max_ts < retention_cutoff | Delete segment |
Segment spans the retention boundary (min_ts < retention_cutoff ≤ max_ts) | Rewrite segment to strip expired data, then move |
max_ts < warm_cutoff and tiering enabled | Move segment to cold tier |
max_ts < hot_cutoff and tiering enabled | Move segment to warm tier |
| Otherwise | Leave in current tier |
Move semantics
Tier moves are copy-then-delete:- The segment directory is copied to the destination path under a staging name.
- Once the copy is verified (fingerprint checked), the staged copy is atomically promoted.
- The segment catalog is updated and swapped into the visible persisted index.
- Only after the new location is visible to queries is the source directory retired.
Segment catalog
The catalog (segment_catalog.json in the object-store root) is a JSON snapshot of the full SegmentInventory. It records each segment’s lane, tier, level, ID, timestamp bounds, point count, and relative path.
- ReadWrite nodes write the catalog atomically after each maintenance pass.
- Compute-only nodes read the catalog periodically (controlled by
remote_segment_refresh_interval) and never write it. - The catalog is version-stamped (current version: 2) and validated on load. Entries with path traversal sequences (
.., absolute paths) are rejected.
Query routing
Each query carries aTieredQueryPlan that specifies which tiers to include based on the query’s time range and the current tier cutoffs:
| Query time range | Tiers scanned |
|---|---|
Entirely within hot window (start ≥ hot_cutoff) | Hot only |
Overlaps warm window (start < hot_cutoff) | Hot + Warm |
Overlaps cold window (start < warm_cutoff) | Hot + Warm + Cold |
Hot segment mirroring
Settingwith_mirror_hot_segments_to_object_store(true) (or --mirror-hot-segments-to-object-store true) copies each newly flushed segment into {object_store_root}/hot/ immediately, before the normal post-flush age-based movement occurs.
Use this when:
- You want compute-only query nodes to have immediate access to fresh data (avoiding the refresh interval lag).
- Hot data durability beyond the local disk is required.
- You are running a disaggregated storage/compute architecture where all I/O should go through the object store.
hot_retention_window.
Compute-only mode
A node inComputeOnly mode:
- Does not accept writes or run the WAL.
- Does not hold a local segment catalog path (no writes to the catalog).
- Reads the segment catalog from the object-store root on a periodic refresh cycle.
- Serves queries by reading segments directly from the object-store tiers.
object_store_path to be set and mirror_hot_segments_to_object_store to be enabled on the writer node so that they can see fresh data promptly.
Observability
Flush metrics (FlushObservabilitySnapshot)
| Field | Description |
|---|---|
tier_moves_total | Successful tier move operations since startup |
tier_move_errors_total | Failed tier move operations |
expired_segments_total | Segments deleted by retention enforcement |
hot_segments_visible | Current count of hot-tier segments in the index |
warm_segments_visible | Current count of warm-tier segments |
cold_segments_visible | Current count of cold-tier segments |
Query metrics (QueryObservabilitySnapshot)
| Field | Description |
|---|---|
hot_only_query_plans_total | Queries that scanned the hot tier only |
warm_tier_query_plans_total | Queries that included the warm tier |
cold_tier_query_plans_total | Queries that included the cold tier |
hot_tier_persisted_chunks_read_total | Chunk reads from the hot tier |
warm_tier_persisted_chunks_read_total | Chunk reads from the warm tier |
cold_tier_persisted_chunks_read_total | Chunk reads from the cold tier |
warm_tier_fetch_duration_nanos_total | Cumulative fetch time for warm tier chunks |
cold_tier_fetch_duration_nanos_total | Cumulative fetch time for cold tier chunks |
Remote storage metrics (RemoteStorageObservabilitySnapshot)
| Field | Description |
|---|---|
enabled | Whether tiered storage is configured |
runtime_mode | ReadWrite or ComputeOnly |
mirror_hot_segments | Whether hot segment mirroring is active |
catalog_refreshes_total | Total catalog refresh attempts |
catalog_refresh_errors_total | Failed catalog refreshes |
accessible | Whether the object store was reachable on last check |
last_successful_refresh_unix_ms | Unix timestamp of last successful catalog read |
consecutive_refresh_failures | Number of consecutive failures (used for backoff) |
backoff_active | Whether exponential backoff is in effect |
/metrics endpoint in the server.
Python bindings
The tiering configuration is available through the UniFFI Python bindings:Operational notes
- Object-store root can be any path — in production this is typically a FUSE mount or network filesystem. tsink itself uses standard filesystem calls and has no direct S3/GCS SDK dependency.
- Tier moves are not reversible automatically — once a segment is in the cold tier there is no built-in promotion back to warm or hot. Adjust
hot_retention_window/warm_retention_windowto control placement. - Concurrent access — multiple ReadWrite nodes pointing at the same
object_store_rootare not supported. Use the cluster mode (which distributes shards) instead of sharing a single tier root. - Recovery at startup — on startup, the engine reads the catalog (if present) or scans all tier directories. Corrupt or unreadable segments are quarantined rather than causing a startup failure. Quarantined paths are logged.
- Capacity planning — each tier directory grows monotonically until the post-flush sweep runs. Retention enforcement and compaction both reduce segment count; ensure the object-store volume has sufficient capacity for
warm_retention_window + cold_retention_windowworth of data.