Data Types & Native Histograms
This document describes every type that can be stored in a tsink data point, how mixed types are handled, how timestamps are represented, the limits that apply to metrics and labels, and how custom user types are embedded inside thebytes lane.
Table of Contents
- Core Data Model
- Value Types
- Value Lanes
- Encoding Codecs by Type
- Timestamp Precision
- Metrics and Labels
- Type Coercions
- Custom Types via the Codec Trait
- Aggregation
- Python Bindings Type Mapping
Core Data Model
The smallest unit of storage is aDataPoint, which pairs a typed value with an i64 timestamp. Multiple data points for the same named metric are grouped into a Row:
Value Types
All sample payloads are represented by theValue enum:
ValueTypeMismatch error.
float64
Value::F64(f64) is the default numeric type and the only type directly consumed by the PromQL evaluator. Use it for gauges, counters, latency histograms expressed as raw floats, and any metric that needs PromQL processing.
NaN is a valid stored value and is preserved across encoding and decoding. NaN equality is defined as NaN == NaN for storage index purposes (deduplication), which diverges from IEEE 754.
int64
Value::I64(i64) covers signed counters, monotonic event IDs, and any domain where integer semantics matter (no rounding at extreme values). The codec is ZigZag-encoded delta bitpack, which is efficient for slowly changing integers.
When queried via as_f64(), the conversion succeeds only if the integer can be represented exactly as a 64-bit float (i.e., the absolute value fits in 53 bits). Large values like i64::MAX return None rather than silently rounding.
uint64
Value::U64(u64) covers unsigned accumulators and bitmask values. The codec is delta-bitpack (no ZigZag needed for non-negative deltas). The same exactness rule as i64 applies when converting to f64.
bool
Value::Bool(bool) is stored with a bitpack codec — one bit per sample. It is not coercible to f64; use a 0.0/1.0 float if you need PromQL arithmetic on boolean flags.
bytes
Value::Bytes(Vec<u8>) is the escape hatch for any payload that does not fit into a numeric type — serialized Protobuf, MessagePack, JSON, or any custom binary encoding. No interpretation is done on the byte payload by the engine.
string and bytes share the same blob value lane and the same on-disk codec (bytes delta block). From the engine’s perspective they are interchangeable; the distinction is only at the API boundary.
Custom Rust types can be embedded as bytes via the Codec trait — see Custom Types via the Codec Trait.
string
Value::String(String) stores UTF-8 text. Like bytes, it is stored on the blob lane using the bytes delta block codec and does not participate in numeric aggregation.
Native Histogram
Value::Histogram(Box<NativeHistogram>) stores a complete Prometheus-compatible native histogram sample. Histograms use the blob lane and are serialized with the bytes delta block codec.
NativeHistogram Structure
HistogramCount — count and zero count can be either integer or float to accommodate both classical integer counting and weighted/scaled histograms:
HistogramBucketSpan — describes a contiguous run of populated buckets in the sparse representation:
HistogramResetHint — indicates whether the histogram was reset before this sample:
| Variant | Meaning |
|---|---|
Unknown | Reset status is not known |
Yes | A reset definitely occurred |
No | No reset occurred |
Gauge | This is a gauge histogram (not accumulated) |
Sparse bucket encoding
Buckets are stored sparsely as a sequence of (span, deltas/counts) pairs. Thenegative_spans/positive_spans arrays describe which bucket slots are populated; negative_deltas/positive_deltas give the delta-encoded integer counts for each slot; negative_counts/positive_counts give the absolute float counts (used instead of deltas when any count is non-integer).
Either the *_deltas fields or the *_counts fields are populated for a given sample — not both. When float counts are present, *_deltas should be empty, and vice versa.
NaN semantics
For purposes of stored equality (used by deduplication and the WAL), NaN values inside a histogram are considered equal to other NaN values of the same sign.Value Lanes
Internally, every series is assigned to one of two mutually exclusive value lanes based on the type of its first ingested sample:| Lane | Types | On-disk directory |
|---|---|---|
Numeric | f64, i64, u64, bool | lane_numeric/ |
Blob | bytes, string, NativeHistogram | lane_blob/ |
ValueTypeMismatch error.
Keeping numeric and blob data physically separate allows their compaction jobs to run independently and avoids mixing integer/float codecs with variable-length blob codecs in the same segment file.
Encoding Codecs by Type
tsink selects the most compact codec automatically for each chunk at flush time. The codec choice is stored in the chunk header and used verbatim during reads — no re-encoding occurs on read. Timestamp codecs are chosen independently from value codecs:| Codec | When selected |
|---|---|
FixedStepRle | All timestamps are evenly spaced (constant scrape interval) |
DeltaOfDeltaBitpack | Timestamps have a slowly drifting interval |
DeltaVarint | Irregular timestamps; always applicable as a fallback |
| Type | Codec | Notes |
|---|---|---|
f64 | Gorilla XOR | Facebook Gorilla XOR-based float compression |
i64 | ZigZag delta bitpack | Maps signed deltas to unsigned, then bitpacks |
u64 | Delta bitpack | Non-negative deltas bitpacked directly |
bool | Bit-pack | 1 bit per sample |
bytes / string / NativeHistogram | Bytes delta block | Variable-length records with delta compression |
| Any type | Constant RLE | Applied when all values in a chunk are identical; takes priority over type-specific codecs |
Timestamp Precision
TheTimestampPrecision configuration setting tells the engine how to interpret the i64 timestamp in each DataPoint:
| Variant | Unit | Maximum date |
|---|---|---|
Nanoseconds | 1 ns | ~2262 |
Microseconds | 1 µs | ~294246 |
Milliseconds | 1 ms | ~292278994 |
Seconds | 1 s | ~292277026596 |
StorageBuilder and applies to all timestamps written to that storage instance. Mixing precisions within a single instance is not supported — timestamps from different precisions are not automatically renormalized.
The DataPoint::new constructor accepts the timestamp as a raw i64; callers are responsible for ensuring the value matches the configured precision.
Metrics and Labels
Metric name
A metric name is an arbitrary UTF-8 string. The only restriction is length:| Limit | Value |
|---|---|
| Maximum metric name length | 65 535 bytes (u16::MAX) |
Labels
A label is a UTF-8 key-value pair. Both name and value must be non-empty. Length limits:| Field | Limit |
|---|---|
| Label name | 256 bytes |
| Label value | 16 384 bytes (16 KiB) |
{a="1", b="2"} and {b="2", a="1"} refer to the same series.
Series identity
The engine assigns each unique (metric, sorted-labels) combination a 64-bitSeriesId. The identity is computed as a stable FNV-1a hash over a canonical binary encoding of the metric name and sorted label pairs. This hash is used internally; callers always identify series by metric name and labels.
Type Coercions
TheValue type exposes explicit, lossless conversion accessors:
| Method | Returns | Applies to |
|---|---|---|
as_f64() | Option<f64> | F64, I64 (if fits in 53 bits), U64 (if fits in 53 bits) |
as_i64() | Option<i64> | I64, U64 (if fits in i64) |
as_u64() | Option<u64> | U64, I64 (if non-negative and fits in u64) |
as_bool() | Option<bool> | Bool only |
as_bytes() | Option<&[u8]> | Bytes only |
as_str() | Option<&str> | String only |
as_histogram() | Option<&NativeHistogram> | Histogram only |
as_f64() precision note: an i64 or u64 value is only converted if its absolute value can be represented exactly with 53 mantissa bits (the precision of f64). Values like i64::MAX (63 significant bits) return None. This prevents silent precision loss in numeric pipelines.
The PromQL evaluator calls as_f64() on every sample. Series whose values cannot be losslessly represented as f64 — including bool, bytes, string, and oversized integers — are excluded from PromQL evaluation.
Custom Types via the Codec Trait
Any Rust type can be stored inside thebytes lane by implementing the Codec trait:
Custom aggregation
To aggregate custom-typed series at query time, pair aCodec with an Aggregator:
CodecAggregator<C, A> adapter bridges a Codec and a typed Aggregator into the BytesAggregation interface expected by QueryOptions::custom_aggregation:
CodecAggregator decodes each Value::Bytes point using the codec, passes all decoded values to the aggregator, encodes the result back to bytes, and attaches the bucket start timestamp to the aggregate point.
Aggregation
When using the built-inAggregation enum with QueryOptions, the engine aggregates numeric values (f64, and integers coercible to f64) over the query time range or per-bucket when downsampling is enabled:
| Variant | Operation |
|---|---|
None | No aggregation — raw samples returned |
Sum | Sum of all values |
Min | Minimum value |
Max | Maximum value |
Avg | Mean |
First | Earliest sample in the window |
Last | Latest sample in the window |
Count | Number of samples |
Median | Median (50th percentile) |
Range | max - min |
Variance | Population variance |
StdDev | Population standard deviation |
f64 projection of a value (Value::as_f64()). Bytes, String, and NativeHistogram series are not aggregated by the built-in variants; use custom_aggregation for those types.
Python Bindings Type Mapping
The UniFFI Python bindings expose the same model names as the Rust API. The mapping is direct:| Python type | Rust equivalent |
|---|---|
Value (enum) | Value |
DataPoint | DataPoint |
Row | Row |
Label | Label |
NativeHistogram | NativeHistogram |
HistogramBucketSpan | HistogramBucketSpan |
HistogramCount (enum) | HistogramCount |
HistogramResetHint (enum) | HistogramResetHint |
Value is a tagged-union enum with named fields per variant:
string variant is named Str (not String) to avoid conflicting with the Python built-in.