Data Types & Native Histograms

This document describes every type that can be stored in a tsink data point, how mixed types are handled, how timestamps are represented, the limits that apply to metrics and labels, and how custom user types are embedded inside the bytes lane.

Core Data Model
Value Types
- float64
- int64
- uint64
- bool
- bytes
- string
- Native Histogram
Value Lanes
Encoding Codecs by Type
Timestamp Precision
Metrics and Labels
Type Coercions
Custom Types via the Codec Trait
Aggregation
Python Bindings Type Mapping

Core Data Model

The smallest unit of storage is a DataPoint, which pairs a typed value with an i64 timestamp. Multiple data points for the same named metric are grouped into a Row:

pub struct DataPoint {
    pub value: Value,
    pub timestamp: i64,
}

pub struct Row {
    // metric name
    // labels (key-value pairs)
    // data_point
}

A series is identified by a metric name together with the full, sorted set of label key-value pairs. Two rows with the same metric name but different labels belong to different series.

Value Types

All sample payloads are represented by the Value enum:

pub enum Value {
    F64(f64),
    I64(i64),
    U64(u64),
    Bool(bool),
    Bytes(Vec<u8>),
    String(String),
    Histogram(Box<NativeHistogram>),
}

Every variant in a chunk must belong to the same value family (see Value Lanes). Mixing different numeric variants, or mixing a numeric variant with a blob variant, in the same write batch returns a ValueTypeMismatch error.

float64

Value::F64(f64) is the default numeric type and the only type directly consumed by the PromQL evaluator. Use it for gauges, counters, latency histograms expressed as raw floats, and any metric that needs PromQL processing. NaN is a valid stored value and is preserved across encoding and decoding. NaN equality is defined as NaN == NaN for storage index purposes (deduplication), which diverges from IEEE 754.

int64

Value::I64(i64) covers signed counters, monotonic event IDs, and any domain where integer semantics matter (no rounding at extreme values). The codec is ZigZag-encoded delta bitpack, which is efficient for slowly changing integers. When queried via as_f64(), the conversion succeeds only if the integer can be represented exactly as a 64-bit float (i.e., the absolute value fits in 53 bits). Large values like i64::MAX return None rather than silently rounding.

uint64

Value::U64(u64) covers unsigned accumulators and bitmask values. The codec is delta-bitpack (no ZigZag needed for non-negative deltas). The same exactness rule as i64 applies when converting to f64.

bool

Value::Bool(bool) is stored with a bitpack codec — one bit per sample. It is not coercible to f64; use a 0.0/1.0 float if you need PromQL arithmetic on boolean flags.

bytes

Value::Bytes(Vec<u8>) is the escape hatch for any payload that does not fit into a numeric type — serialized Protobuf, MessagePack, JSON, or any custom binary encoding. No interpretation is done on the byte payload by the engine. string and bytes share the same blob value lane and the same on-disk codec (bytes delta block). From the engine’s perspective they are interchangeable; the distinction is only at the API boundary. Custom Rust types can be embedded as bytes via the Codec trait — see Custom Types via the Codec Trait.

string

Value::String(String) stores UTF-8 text. Like bytes, it is stored on the blob lane using the bytes delta block codec and does not participate in numeric aggregation.

Native Histogram

Value::Histogram(Box<NativeHistogram>) stores a complete Prometheus-compatible native histogram sample. Histograms use the blob lane and are serialized with the bytes delta block codec.

NativeHistogram Structure

pub struct NativeHistogram {
    // Total sample count. Either an integer or a float (for weighted observations).
    pub count: Option<HistogramCount>,
    // Sum of all observed values.
    pub sum: f64,
    // Exponential bucket schema (-4..=8, or schema 0 for custom buckets).
    pub schema: i32,
    // Half-width of the zero bucket.
    pub zero_threshold: f64,
    // Count of samples in the zero bucket.
    pub zero_count: Option<HistogramCount>,

    // Negative-side sparse buckets.
    pub negative_spans: Vec<HistogramBucketSpan>,
    // Delta-encoded bucket counts (integer mode).
    pub negative_deltas: Vec<i64>,
    // Absolute bucket counts (float mode).
    pub negative_counts: Vec<f64>,

    // Positive-side sparse buckets.
    pub positive_spans: Vec<HistogramBucketSpan>,
    pub positive_deltas: Vec<i64>,
    pub positive_counts: Vec<f64>,

    // Reset hint communicated to downstream consumers.
    pub reset_hint: HistogramResetHint,
    // Explicit bucket boundaries for custom schema (schema = -53).
    pub custom_values: Vec<f64>,
}

HistogramCount — count and zero count can be either integer or float to accommodate both classical integer counting and weighted/scaled histograms:

pub enum HistogramCount {
    Int(u64),
    Float(f64),
}

HistogramBucketSpan — describes a contiguous run of populated buckets in the sparse representation:

pub struct HistogramBucketSpan {
    pub offset: i32,  // gap in bucket index from the previous span's end
    pub length: u32,  // number of consecutive populated buckets
}

HistogramResetHint — indicates whether the histogram was reset before this sample:

Variant	Meaning
`Unknown`	Reset status is not known
`Yes`	A reset definitely occurred
`No`	No reset occurred
`Gauge`	This is a gauge histogram (not accumulated)

Sparse bucket encoding

Buckets are stored sparsely as a sequence of (span, deltas/counts) pairs. The negative_spans/positive_spans arrays describe which bucket slots are populated; negative_deltas/positive_deltas give the delta-encoded integer counts for each slot; negative_counts/positive_counts give the absolute float counts (used instead of deltas when any count is non-integer). Either the *_deltas fields or the *_counts fields are populated for a given sample — not both. When float counts are present, *_deltas should be empty, and vice versa.

NaN semantics

For purposes of stored equality (used by deduplication and the WAL), NaN values inside a histogram are considered equal to other NaN values of the same sign.

Value Lanes

Internally, every series is assigned to one of two mutually exclusive value lanes based on the type of its first ingested sample:

Lane	Types	On-disk directory
`Numeric`	`f64`, `i64`, `u64`, `bool`	`lane_numeric/`
`Blob`	`bytes`, `string`, `NativeHistogram`	`lane_blob/`

The lane is derived at ingest time and persisted in the series registry. Once a series is assigned to a lane, all subsequent writes must use a compatible value type. Writing a numeric type to a blob-lane series, or vice versa, returns a ValueTypeMismatch error. Keeping numeric and blob data physically separate allows their compaction jobs to run independently and avoids mixing integer/float codecs with variable-length blob codecs in the same segment file.

Encoding Codecs by Type

tsink selects the most compact codec automatically for each chunk at flush time. The codec choice is stored in the chunk header and used verbatim during reads — no re-encoding occurs on read. Timestamp codecs are chosen independently from value codecs:

Codec	When selected
`FixedStepRle`	All timestamps are evenly spaced (constant scrape interval)
`DeltaOfDeltaBitpack`	Timestamps have a slowly drifting interval
`DeltaVarint`	Irregular timestamps; always applicable as a fallback

Value codecs by type:

Type	Codec	Notes
`f64`	Gorilla XOR	Facebook Gorilla XOR-based float compression
`i64`	ZigZag delta bitpack	Maps signed deltas to unsigned, then bitpacks
`u64`	Delta bitpack	Non-negative deltas bitpacked directly
`bool`	Bit-pack	1 bit per sample
`bytes` / `string` / `NativeHistogram`	Bytes delta block	Variable-length records with delta compression
Any type	Constant RLE	Applied when all values in a chunk are identical; takes priority over type-specific codecs

The engine evaluates all applicable candidates for a given chunk and selects the one producing the smallest payload.

Timestamp Precision

The TimestampPrecision configuration setting tells the engine how to interpret the i64 timestamp in each DataPoint:

Variant	Unit	Maximum date
`Nanoseconds`	1 ns	~2262
`Microseconds`	1 µs	~294246
`Milliseconds`	1 ms	~292278994
`Seconds`	1 s	~292277026596

Precision is configured once on StorageBuilder and applies to all timestamps written to that storage instance. Mixing precisions within a single instance is not supported — timestamps from different precisions are not automatically renormalized. The DataPoint::new constructor accepts the timestamp as a raw i64; callers are responsible for ensuring the value matches the configured precision.

Metrics and Labels

Metric name

A metric name is an arbitrary UTF-8 string. The only restriction is length:

Limit	Value
Maximum metric name length	65 535 bytes (`u16::MAX`)

An empty metric name is rejected at ingest time.

Labels

A label is a UTF-8 key-value pair. Both name and value must be non-empty. Length limits:

Field	Limit
Label name	256 bytes
Label value	16 384 bytes (16 KiB)

Duplicate label names within a single row are rejected. Labels are normalized to lexicographic order by name before computing the series identity, so {a="1", b="2"} and {b="2", a="1"} refer to the same series.

Series identity

The engine assigns each unique (metric, sorted-labels) combination a 64-bit SeriesId. The identity is computed as a stable FNV-1a hash over a canonical binary encoding of the metric name and sorted label pairs. This hash is used internally; callers always identify series by metric name and labels.

Type Coercions

The Value type exposes explicit, lossless conversion accessors:

Method	Returns	Applies to
`as_f64()`	`Option<f64>`	`F64`, `I64` (if fits in 53 bits), `U64` (if fits in 53 bits)
`as_i64()`	`Option<i64>`	`I64`, `U64` (if fits in `i64`)
`as_u64()`	`Option<u64>`	`U64`, `I64` (if non-negative and fits in `u64`)
`as_bool()`	`Option<bool>`	`Bool` only
`as_bytes()`	`Option<&[u8]>`	`Bytes` only
`as_str()`	`Option<&str>`	`String` only
`as_histogram()`	`Option<&NativeHistogram>`	`Histogram` only

as_f64() precision note: an i64 or u64 value is only converted if its absolute value can be represented exactly with 53 mantissa bits (the precision of f64). Values like i64::MAX (63 significant bits) return None. This prevents silent precision loss in numeric pipelines. The PromQL evaluator calls as_f64() on every sample. Series whose values cannot be losslessly represented as f64 — including bool, bytes, string, and oversized integers — are excluded from PromQL evaluation.

Custom Types via the Codec Trait

Any Rust type can be stored inside the bytes lane by implementing the Codec trait:

pub trait Codec: Send + Sync {
    type Item: Clone + Send + Sync + 'static;

    fn encode(&self, value: &Self::Item) -> Result<Vec<u8>>;
    fn decode(&self, bytes: &[u8]) -> Result<Self::Item>;
}

Encoding and decoding a point:

// Write
let encoded = Value::encode_with(&my_value, &my_codec)?;  // → Value::Bytes(...)

// Read
let decoded: MyType = point.value.decode_with(&my_codec)?;

Custom aggregation

To aggregate custom-typed series at query time, pair a Codec with an Aggregator:

pub trait Aggregator<T>: Send + Sync {
    fn aggregate(&self, values: &[T]) -> Option<T>;
}

The CodecAggregator<C, A> adapter bridges a Codec and a typed Aggregator into the BytesAggregation interface expected by QueryOptions::custom_aggregation:

let agg = Arc::new(CodecAggregator::new(MyCodec, MyAggregator));

let options = QueryOptions::new(start, end)
    .with_custom_aggregation(agg);

CodecAggregator decodes each Value::Bytes point using the codec, passes all decoded values to the aggregator, encodes the result back to bytes, and attaches the bucket start timestamp to the aggregate point.

Aggregation

When using the built-in Aggregation enum with QueryOptions, the engine aggregates numeric values (f64, and integers coercible to f64) over the query time range or per-bucket when downsampling is enabled:

Variant	Operation
`None`	No aggregation — raw samples returned
`Sum`	Sum of all values
`Min`	Minimum value
`Max`	Maximum value
`Avg`	Mean
`First`	Earliest sample in the window
`Last`	Latest sample in the window
`Count`	Number of samples
`Median`	Median (50th percentile)
`Range`	`max - min`
`Variance`	Population variance
`StdDev`	Population standard deviation

Built-in aggregation operates on the f64 projection of a value (Value::as_f64()). Bytes, String, and NativeHistogram series are not aggregated by the built-in variants; use custom_aggregation for those types.

Python Bindings Type Mapping

The UniFFI Python bindings expose the same model names as the Rust API. The mapping is direct:

Python type	Rust equivalent
`Value` (enum)	`Value`
`DataPoint`	`DataPoint`
`Row`	`Row`
`Label`	`Label`
`NativeHistogram`	`NativeHistogram`
`HistogramBucketSpan`	`HistogramBucketSpan`
`HistogramCount` (enum)	`HistogramCount`
`HistogramResetHint` (enum)	`HistogramResetHint`

Value is a tagged-union enum with named fields per variant:

from tsink import Value, DataPoint, NativeHistogram, HistogramBucketSpan

# float64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.F64(v=1.5))

# int64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.I64(v=-7))

# uint64
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.U64(v=42))

# bool
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Bool(v=True))

# bytes
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Bytes(v=b"\x01\x02"))

# string
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Str(v="hello"))

# native histogram
hist = NativeHistogram(
    count=HistogramCount.Int(v=10),
    sum=5.5,
    schema=1,
    zero_threshold=0.0,
    zero_count=HistogramCount.Int(v=0),
    negative_spans=[],
    negative_deltas=[],
    negative_counts=[],
    positive_spans=[HistogramBucketSpan(offset=0, length=2)],
    positive_deltas=[3, 2],
    positive_counts=[],
    reset_hint=HistogramResetHint.NO,
    custom_values=[],
)
dp = DataPoint(timestamp=1_700_000_000_000, value=Value.Histogram(v=hist))

Note that in the Python bindings the string variant is named Str (not String) to avoid conflicting with the Python built-in.

Data types

Data Types & Native Histograms

Table of Contents

Core Data Model

Value Types

float64

int64

uint64

bool

bytes

string

Native Histogram

NativeHistogram Structure

Sparse bucket encoding

NaN semantics

Value Lanes

Encoding Codecs by Type

Timestamp Precision

Metrics and Labels

Metric name

Labels

Series identity

Type Coercions

Custom Types via the Codec Trait

Custom aggregation

Aggregation

Python Bindings Type Mapping

​Data Types & Native Histograms

​Table of Contents

​Core Data Model

​Value Types

​float64

​int64

​uint64

​bool

​bytes

​string

​Native Histogram

​NativeHistogram Structure

​Sparse bucket encoding

​NaN semantics

​Value Lanes

​Encoding Codecs by Type

​Timestamp Precision

​Metrics and Labels

​Metric name

​Labels

​Series identity

​Type Coercions

​Custom Types via the Codec Trait

​Custom aggregation

​Aggregation

​Python Bindings Type Mapping

Data Types & Native Histograms

Table of Contents

Core Data Model

Value Types

float64

int64

uint64

bool

bytes

string

Native Histogram

NativeHistogram Structure

Sparse bucket encoding

NaN semantics

Value Lanes

Encoding Codecs by Type

Timestamp Precision

Metrics and Labels

Metric name

Labels

Series identity

Type Coercions

Custom Types via the Codec Trait

Custom aggregation

Aggregation

Python Bindings Type Mapping