Skip to main content

Data Types & Native Histograms

This document describes every type that can be stored in a tsink data point, how mixed types are handled, how timestamps are represented, the limits that apply to metrics and labels, and how custom user types are embedded inside the bytes lane.

Table of Contents

  1. Core Data Model
  2. Value Types
  3. Value Lanes
  4. Encoding Codecs by Type
  5. Timestamp Precision
  6. Metrics and Labels
  7. Type Coercions
  8. Custom Types via the Codec Trait
  9. Aggregation
  10. Python Bindings Type Mapping

Core Data Model

The smallest unit of storage is a DataPoint, which pairs a typed value with an i64 timestamp. Multiple data points for the same named metric are grouped into a Row:
pub struct DataPoint {
    pub value: Value,
    pub timestamp: i64,
}

pub struct Row {
    // metric name
    // labels (key-value pairs)
    // data_point
}
A series is identified by a metric name together with the full, sorted set of label key-value pairs. Two rows with the same metric name but different labels belong to different series.

Value Types

All sample payloads are represented by the Value enum:
pub enum Value {
    F64(f64),
    I64(i64),
    U64(u64),
    Bool(bool),
    Bytes(Vec<u8>),
    String(String),
    Histogram(Box<NativeHistogram>),
}
Every variant in a chunk must belong to the same value family (see Value Lanes). Mixing different numeric variants, or mixing a numeric variant with a blob variant, in the same write batch returns a ValueTypeMismatch error.

float64

Value::F64(f64) is the default numeric type and the only type directly consumed by the PromQL evaluator. Use it for gauges, counters, latency histograms expressed as raw floats, and any metric that needs PromQL processing. NaN is a valid stored value and is preserved across encoding and decoding. NaN equality is defined as NaN == NaN for storage index purposes (deduplication), which diverges from IEEE 754.

int64

Value::I64(i64) covers signed counters, monotonic event IDs, and any domain where integer semantics matter (no rounding at extreme values). The codec is ZigZag-encoded delta bitpack, which is efficient for slowly changing integers. When queried via as_f64(), the conversion succeeds only if the integer can be represented exactly as a 64-bit float (i.e., the absolute value fits in 53 bits). Large values like i64::MAX return None rather than silently rounding.

uint64

Value::U64(u64) covers unsigned accumulators and bitmask values. The codec is delta-bitpack (no ZigZag needed for non-negative deltas). The same exactness rule as i64 applies when converting to f64.

bool

Value::Bool(bool) is stored with a bitpack codec — one bit per sample. It is not coercible to f64; use a 0.0/1.0 float if you need PromQL arithmetic on boolean flags.

bytes

Value::Bytes(Vec<u8>) is the escape hatch for any payload that does not fit into a numeric type — serialized Protobuf, MessagePack, JSON, or any custom binary encoding. No interpretation is done on the byte payload by the engine. string and bytes share the same blob value lane and the same on-disk codec (bytes delta block). From the engine’s perspective they are interchangeable; the distinction is only at the API boundary. Custom Rust types can be embedded as bytes via the Codec trait — see Custom Types via the Codec Trait.

string

Value::String(String) stores UTF-8 text. Like bytes, it is stored on the blob lane using the bytes delta block codec and does not participate in numeric aggregation.

Native Histogram

Value::Histogram(Box<NativeHistogram>) stores a complete Prometheus-compatible native histogram sample. Histograms use the blob lane and are serialized with the bytes delta block codec.

NativeHistogram Structure

pub struct NativeHistogram {
    // Total sample count. Either an integer or a float (for weighted observations).
    pub count: Option<HistogramCount>,
    // Sum of all observed values.
    pub sum: f64,
    // Exponential bucket schema (-4..=8, or schema 0 for custom buckets).
    pub schema: i32,
    // Half-width of the zero bucket.
    pub zero_threshold: f64,
    // Count of samples in the zero bucket.
    pub zero_count: Option<HistogramCount>,

    // Negative-side sparse buckets.
    pub negative_spans: Vec<HistogramBucketSpan>,
    // Delta-encoded bucket counts (integer mode).
    pub negative_deltas: Vec<i64>,
    // Absolute bucket counts (float mode).
    pub negative_counts: Vec<f64>,

    // Positive-side sparse buckets.
    pub positive_spans: Vec<HistogramBucketSpan>,
    pub positive_deltas: Vec<i64>,
    pub positive_counts: Vec<f64>,

    // Reset hint communicated to downstream consumers.
    pub reset_hint: HistogramResetHint,
    // Explicit bucket boundaries for custom schema (schema = -53).
    pub custom_values: Vec<f64>,
}
HistogramCount — count and zero count can be either integer or float to accommodate both classical integer counting and weighted/scaled histograms:
pub enum HistogramCount {
    Int(u64),
    Float(f64),
}
HistogramBucketSpan — describes a contiguous run of populated buckets in the sparse representation:
pub struct HistogramBucketSpan {
    pub offset: i32,  // gap in bucket index from the previous span's end
    pub length: u32,  // number of consecutive populated buckets
}
HistogramResetHint — indicates whether the histogram was reset before this sample:
VariantMeaning
UnknownReset status is not known
YesA reset definitely occurred
NoNo reset occurred
GaugeThis is a gauge histogram (not accumulated)

Sparse bucket encoding

Buckets are stored sparsely as a sequence of (span, deltas/counts) pairs. The negative_spans/positive_spans arrays describe which bucket slots are populated; negative_deltas/positive_deltas give the delta-encoded integer counts for each slot; negative_counts/positive_counts give the absolute float counts (used instead of deltas when any count is non-integer). Either the *_deltas fields or the *_counts fields are populated for a given sample — not both. When float counts are present, *_deltas should be empty, and vice versa.

NaN semantics

For purposes of stored equality (used by deduplication and the WAL), NaN values inside a histogram are considered equal to other NaN values of the same sign.

Value Lanes

Internally, every series is assigned to one of two mutually exclusive value lanes based on the type of its first ingested sample:
LaneTypesOn-disk directory
Numericf64, i64, u64, boollane_numeric/
Blobbytes, string, NativeHistogramlane_blob/
The lane is derived at ingest time and persisted in the series registry. Once a series is assigned to a lane, all subsequent writes must use a compatible value type. Writing a numeric type to a blob-lane series, or vice versa, returns a ValueTypeMismatch error. Keeping numeric and blob data physically separate allows their compaction jobs to run independently and avoids mixing integer/float codecs with variable-length blob codecs in the same segment file.

Encoding Codecs by Type

tsink selects the most compact codec automatically for each chunk at flush time. The codec choice is stored in the chunk header and used verbatim during reads — no re-encoding occurs on read. Timestamp codecs are chosen independently from value codecs:
CodecWhen selected
FixedStepRleAll timestamps are evenly spaced (constant scrape interval)
DeltaOfDeltaBitpackTimestamps have a slowly drifting interval
DeltaVarintIrregular timestamps; always applicable as a fallback
Value codecs by type:
TypeCodecNotes
f64Gorilla XORFacebook Gorilla XOR-based float compression
i64ZigZag delta bitpackMaps signed deltas to unsigned, then bitpacks
u64Delta bitpackNon-negative deltas bitpacked directly
boolBit-pack1 bit per sample
bytes / string / NativeHistogramBytes delta blockVariable-length records with delta compression
Any typeConstant RLEApplied when all values in a chunk are identical; takes priority over type-specific codecs
The engine evaluates all applicable candidates for a given chunk and selects the one producing the smallest payload.

Timestamp Precision

The TimestampPrecision configuration setting tells the engine how to interpret the i64 timestamp in each DataPoint:
VariantUnitMaximum date
Nanoseconds1 ns~2262
Microseconds1 µs~294246
Milliseconds1 ms~292278994
Seconds1 s~292277026596
Precision is configured once on StorageBuilder and applies to all timestamps written to that storage instance. Mixing precisions within a single instance is not supported — timestamps from different precisions are not automatically renormalized. The DataPoint::new constructor accepts the timestamp as a raw i64; callers are responsible for ensuring the value matches the configured precision.

Metrics and Labels

Metric name

A metric name is an arbitrary UTF-8 string. The only restriction is length:
LimitValue
Maximum metric name length65 535 bytes (u16::MAX)
An empty metric name is rejected at ingest time.

Labels

A label is a UTF-8 key-value pair. Both name and value must be non-empty. Length limits:
FieldLimit
Label name256 bytes
Label value16 384 bytes (16 KiB)
Duplicate label names within a single row are rejected. Labels are normalized to lexicographic order by name before computing the series identity, so {a="1", b="2"} and {b="2", a="1"} refer to the same series.

Series identity

The engine assigns each unique (metric, sorted-labels) combination a 64-bit SeriesId. The identity is computed as a stable FNV-1a hash over a canonical binary encoding of the metric name and sorted label pairs. This hash is used internally; callers always identify series by metric name and labels.

Type Coercions

The Value type exposes explicit, lossless conversion accessors:
MethodReturnsApplies to
as_f64()Option<f64>F64, I64 (if fits in 53 bits), U64 (if fits in 53 bits)
as_i64()Option<i64>I64, U64 (if fits in i64)
as_u64()Option<u64>U64, I64 (if non-negative and fits in u64)
as_bool()Option<bool>Bool only
as_bytes()Option<&[u8]>Bytes only
as_str()Option<&str>String only
as_histogram()Option<&NativeHistogram>Histogram only
as_f64() precision note: an i64 or u64 value is only converted if its absolute value can be represented exactly with 53 mantissa bits (the precision of f64). Values like i64::MAX (63 significant bits) return None. This prevents silent precision loss in numeric pipelines. The PromQL evaluator calls as_f64() on every sample. Series whose values cannot be losslessly represented as f64 — including bool, bytes, string, and oversized integers — are excluded from PromQL evaluation.

Custom Types via the Codec Trait

Any Rust type can be stored inside the bytes lane by implementing the Codec trait:
pub trait Codec: Send + Sync {
    type Item: Clone + Send + Sync + 'static;

    fn encode(&self, value: &Self::Item) -> Result<Vec<u8>>;
    fn decode(&self, bytes: &[u8]) -> Result<Self::Item>;
}
Encoding and decoding a point:
// Write
let encoded = Value::encode_with(&my_value, &my_codec)?;  // → Value::Bytes(...)

// Read
let decoded: MyType = point.value.decode_with(&my_codec)?;

Custom aggregation

To aggregate custom-typed series at query time, pair a Codec with an Aggregator:
pub trait Aggregator<T>: Send + Sync {
    fn aggregate(&self, values: &[T]) -> Option<T>;
}
The CodecAggregator<C, A> adapter bridges a Codec and a typed Aggregator into the BytesAggregation interface expected by QueryOptions::custom_aggregation:
let agg = Arc::new(CodecAggregator::new(MyCodec, MyAggregator));

let options = QueryOptions::new(start, end)
    .with_custom_aggregation(agg);
CodecAggregator decodes each Value::Bytes point using the codec, passes all decoded values to the aggregator, encodes the result back to bytes, and attaches the bucket start timestamp to the aggregate point.

Aggregation

When using the built-in Aggregation enum with QueryOptions, the engine aggregates numeric values (f64, and integers coercible to f64) over the query time range or per-bucket when downsampling is enabled:
VariantOperation
NoneNo aggregation — raw samples returned
SumSum of all values
MinMinimum value
MaxMaximum value
AvgMean
FirstEarliest sample in the window
LastLatest sample in the window
CountNumber of samples
MedianMedian (50th percentile)
Rangemax - min
VariancePopulation variance
StdDevPopulation standard deviation
Built-in aggregation operates on the f64 projection of a value (Value::as_f64()). Bytes, String, and NativeHistogram series are not aggregated by the built-in variants; use custom_aggregation for those types.

Python Bindings Type Mapping

The UniFFI Python bindings expose the same model names as the Rust API. The mapping is direct:
Python typeRust equivalent
Value (enum)Value
DataPointDataPoint
RowRow
LabelLabel
NativeHistogramNativeHistogram
HistogramBucketSpanHistogramBucketSpan
HistogramCount (enum)HistogramCount
HistogramResetHint (enum)HistogramResetHint
Value is a tagged-union enum with named fields per variant:
from tsink import Value, DataPoint, NativeHistogram, HistogramBucketSpan

# float64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.F64(v=1.5))

# int64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.I64(v=-7))

# uint64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.U64(v=42))

# bool
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Bool(v=True))

# bytes
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Bytes(v=b"\x01\x02"))

# string
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Str(v="hello"))

# native histogram
hist = NativeHistogram(
    count=HistogramCount.Int(v=10),
    sum=5.5,
    schema=1,
    zero_threshold=0.0,
    zero_count=HistogramCount.Int(v=0),
    negative_spans=[],
    negative_deltas=[],
    negative_counts=[],
    positive_spans=[HistogramBucketSpan(offset=0, length=2)],
    positive_deltas=[3, 2],
    positive_counts=[],
    reset_hint=HistogramResetHint.NO,
    custom_values=[],
)
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Histogram(v=hist))
Note that in the Python bindings the string variant is named Str (not String) to avoid conflicting with the Python built-in.