Skip to main content

Monitoring & Observability

tsink exposes three built-in observability surfaces: health probes for Kubernetes liveness/readiness checks, a Prometheus-format self-instrumentation endpoint, and support bundles for ad-hoc diagnostics. All three are available without any extra configuration.

Health probes

EndpointPurpose
GET /healthzLiveness probe — returns ok with HTTP 200 if the server process is running
GET /readyReadiness probe — returns ready with HTTP 200 when the server is ready to serve traffic
Both endpoints bypass authentication, respond with Content-Type: text/plain, and are safe to poll from infrastructure tools without bearer tokens. Kubernetes example
livenessProbe:
  httpGet:
    path: /healthz
    port: 9201
  initialDelaySeconds: 5
  periodSeconds: 10

readinessProbe:
  httpGet:
    path: /ready
    port: 9201
  initialDelaySeconds: 5
  periodSeconds: 10

Self-instrumentation endpoint

GET /metrics
Returns all internal metrics in Prometheus text exposition format 0.0.4. The response is suitable for direct Prometheus scraping.
curl http://127.0.0.1:9201/metrics
When RBAC is enabled, this endpoint requires a token with the metrics:read permission. In unauthenticated deployments it is open.

Scraping with Prometheus

scrape_configs:
  - job_name: tsink
    static_configs:
      - targets: ['127.0.0.1:9201']
    # If RBAC is enabled:
    # authorization:
    #   credentials: <service-account-token>

Metric reference

All metrics use the tsink_ prefix. The sections below enumerate every metric group emitted by the server.

General

MetricTypeDescription
tsink_uptime_secondsgaugeServer uptime in seconds
tsink_series_totalgaugeNumber of known metric series

Memory

MetricTypeDescription
tsink_memory_used_bytesgaugeBytes counted against the configured memory budget
tsink_memory_budget_bytesgaugeConfigured memory budget
tsink_memory_excluded_bytesgaugeMemory intentionally excluded from the budget
tsink_memory_registry_bytesgaugeBudget bytes used by the in-memory series registry
tsink_memory_metadata_cache_bytesgaugeBudget bytes used by metadata caches and indexes
tsink_memory_persisted_index_bytesgaugeBudget bytes used by persisted chunk refs and timestamp indexes
tsink_memory_persisted_mmap_bytesgaugeBudget bytes used by mmap-backed segment payloads
tsink_memory_tombstone_bytesgaugeBudget bytes used by tombstone state

Write-Ahead Log (WAL)

MetricTypeDescription
tsink_wal_enabledgaugeWAL enabled (1) or disabled (0)
tsink_wal_size_bytesgaugeWAL size on disk
tsink_wal_segmentsgaugeWAL segment files present
tsink_wal_active_segmentgaugeCurrent WAL segment id
tsink_wal_acknowledged_writes_durablegaugeWhether acknowledged writes are fsync-durable (1) or append-only (0)
tsink_wal_highwater_segmentgaugeLast appended WAL highwater segment
tsink_wal_highwater_framegaugeLast appended WAL highwater frame
tsink_wal_durable_highwater_segmentgaugeLast durable WAL highwater segment
tsink_wal_durable_highwater_framegaugeLast durable WAL highwater frame
tsink_wal_replay_runs_totalcounterWAL replay runs
tsink_wal_replay_frames_totalcounterWAL replayed frames
tsink_wal_replay_series_definitions_totalcounterWAL replayed series definitions
tsink_wal_replay_sample_batches_totalcounterWAL replayed sample batches
tsink_wal_replay_points_totalcounterWAL replayed points
tsink_wal_replay_errors_totalcounterWAL replay errors
tsink_wal_replay_duration_nanoseconds_totalcounterWAL replay runtime
tsink_wal_append_series_definitions_totalcounterWAL appended series definitions
tsink_wal_append_sample_batches_totalcounterWAL appended sample batches
tsink_wal_append_points_totalcounterWAL appended points
tsink_wal_append_bytes_totalcounterWAL appended bytes
tsink_wal_append_errors_totalcounterWAL append errors
tsink_wal_resets_totalcounterWAL resets
tsink_wal_reset_errors_totalcounterWAL reset errors

Flush pipeline

The flush pipeline moves active (in-memory) chunks into persisted segments and manages tier lifecycle.
MetricTypeDescription
tsink_flush_pipeline_runs_totalcounterFlush pipeline runs
tsink_flush_pipeline_success_totalcounterSuccessful flush pipeline runs
tsink_flush_pipeline_timeout_totalcounterFlush pipeline write-timeout skips
tsink_flush_pipeline_errors_totalcounterFlush pipeline errors
tsink_flush_pipeline_duration_nanoseconds_totalcounterFlush pipeline runtime
tsink_flush_active_runs_totalcounterActive chunk flush runs
tsink_flush_active_errors_totalcounterActive chunk flush errors
tsink_flush_active_series_totalcounterActive series flushed into sealed chunks
tsink_flush_active_chunks_totalcounterActive chunks flushed
tsink_flush_active_points_totalcounterActive points flushed
tsink_flush_persist_runs_totalcounterPersist attempts
tsink_flush_persist_success_totalcounterSuccessful persist runs
tsink_flush_persist_noop_totalcounterPersist runs with no new chunks
tsink_flush_persist_errors_totalcounterPersist errors
tsink_flush_persisted_series_totalcounterSeries persisted
tsink_flush_persisted_chunks_totalcounterChunks persisted
tsink_flush_persisted_points_totalcounterPoints persisted
tsink_flush_persisted_segments_totalcounterSegments emitted by persist
tsink_flush_persist_duration_nanoseconds_totalcounterPersist runtime
tsink_flush_evicted_sealed_chunks_totalcounterSealed chunks evicted after persistence
tsink_flush_tier_moves_totalcounterPersisted segments moved across tiers
tsink_flush_tier_move_errors_totalcounterTier-move errors
tsink_flush_expired_segments_totalcounterSegments expired by retention
tsink_flush_hot_segments_visiblegaugeHot-tier persisted segments visible to queries
tsink_flush_warm_segments_visiblegaugeWarm-tier persisted segments visible to queries
tsink_flush_cold_segments_visiblegaugeCold-tier persisted segments visible to queries

Compaction

MetricTypeDescription
tsink_compaction_runs_totalcounterCompaction invocations
tsink_compaction_success_totalcounterCompaction runs that rewrote segments
tsink_compaction_noop_totalcounterCompaction runs with no rewrite
tsink_compaction_errors_totalcounterCompaction errors
tsink_compaction_source_segments_totalcounterSource segments considered
tsink_compaction_output_segments_totalcounterOutput segments emitted
tsink_compaction_source_chunks_totalcounterSource chunks considered
tsink_compaction_output_chunks_totalcounterOutput chunks emitted
tsink_compaction_source_points_totalcounterSource points considered
tsink_compaction_output_points_totalcounterOutput points emitted
tsink_compaction_duration_nanoseconds_totalcounterCompaction runtime

Query

MetricTypeDescription
tsink_query_select_calls_totalcounterselect calls
tsink_query_select_errors_totalcounterselect errors
tsink_query_select_duration_nanoseconds_totalcounterselect runtime
tsink_query_select_points_returned_totalcounterPoints returned by select
tsink_query_select_with_options_calls_totalcounterselect_with_options calls
tsink_query_select_with_options_errors_totalcounterselect_with_options errors
tsink_query_select_with_options_duration_nanoseconds_totalcounterselect_with_options runtime
tsink_query_select_with_options_points_returned_totalcounterPoints returned
tsink_query_select_all_calls_totalcounterselect_all calls
tsink_query_select_all_errors_totalcounterselect_all errors
tsink_query_select_all_duration_nanoseconds_totalcounterselect_all runtime
tsink_query_select_all_series_returned_totalcounterSeries returned
tsink_query_select_all_points_returned_totalcounterPoints returned
tsink_query_select_series_calls_totalcounterselect_series calls
tsink_query_select_series_errors_totalcounterselect_series errors
tsink_query_select_series_duration_nanoseconds_totalcounterselect_series runtime
tsink_query_select_series_returned_totalcounterSeries returned
tsink_query_merge_path_queries_totalcounterSeries collections using merge path
tsink_query_merge_path_shard_snapshots_totalcounterMerge-path shard snapshots taken
tsink_query_merge_path_shard_snapshot_wait_nanoseconds_totalcounterMerge-path time waiting for shard read locks
tsink_query_merge_path_shard_snapshot_hold_nanoseconds_totalcounterMerge-path time holding shard read locks
tsink_query_append_sort_path_queries_totalcounterSeries collections using append/sort path
tsink_query_hot_only_plans_totalcounterQuery plans satisfied from the hot tier only
tsink_query_warm_tier_plans_totalcounterQuery plans that include the warm tier
tsink_query_cold_tier_plans_totalcounterQuery plans that include the cold tier
tsink_query_hot_tier_persisted_chunks_read_totalcounterHot-tier persisted chunks decoded
tsink_query_warm_tier_persisted_chunks_read_totalcounterWarm-tier persisted chunks decoded
tsink_query_cold_tier_persisted_chunks_read_totalcounterCold-tier persisted chunks decoded
tsink_query_warm_tier_fetch_duration_nanoseconds_totalcounterWarm-tier chunk fetch and decode time
tsink_query_cold_tier_fetch_duration_nanoseconds_totalcounterCold-tier chunk fetch and decode time
tsink_query_rollup_plans_totalcounterQueries that used persisted rollup artifacts
tsink_query_partial_rollup_plans_totalcounterQueries that mixed rollups with raw tail reads
tsink_query_rollup_points_read_totalcounterPersisted rollup points read

Remote (object-store) storage

MetricTypeDescription
tsink_remote_storage_accessiblegauge1 when object-store access is healthy
tsink_remote_storage_compute_onlygauge1 when running in compute-only mode
tsink_remote_storage_mirror_hot_segmentsgauge1 when hot segments are mirrored to object store
tsink_remote_storage_catalog_refreshes_totalcounterRemote catalog refreshes attempted
tsink_remote_storage_catalog_refresh_errors_totalcounterRemote catalog refresh errors
tsink_remote_storage_catalog_refresh_consecutive_failuresgaugeConsecutive catalog refresh failures
tsink_remote_storage_catalog_refresh_backoff_activegauge1 when retry backoff is active

Rollups

MetricTypeDescription
tsink_rollup_worker_runs_totalcounterRollup maintenance passes attempted
tsink_rollup_worker_success_totalcounterSuccessful maintenance passes
tsink_rollup_worker_errors_totalcounterMaintenance passes that errored
tsink_rollup_policy_runs_totalcounterIndividual rollup policy evaluations
tsink_rollup_buckets_materialized_totalcounterRollup buckets materialized
tsink_rollup_points_materialized_totalcounterRollup points materialized
tsink_rollup_last_run_duration_nanosecondsgaugeDuration of the most recent maintenance pass
tsink_rollup_policy_status{policy,metric,aggregation,kind}gaugePer-policy coverage, lag, and timing
tsink_rollup_policy_status is emitted once per configured rollup policy with a kind label for each dimension:
kind valueMeaning
matched_seriesSeries matched by the policy selector
materialized_seriesSeries with persisted rollup artifacts
intervalConfigured rollup interval in milliseconds
materialized_throughLatest materialized timestamp (unix ms)
lagMaterialization lag in milliseconds
last_run_duration_nanosDuration of the last policy run
last_run_started_at_msStart time of the last policy run
last_run_completed_at_msCompletion time of the last policy run

Rules engine

MetricTypeDescription
tsink_rules_scheduler_runs_totalcounterRules scheduler ticks attempted
tsink_rules_scheduler_skipped_not_leader_totalcounterTicks skipped — not cluster leader
tsink_rules_scheduler_skipped_inflight_totalcounterTicks skipped — previous run still in flight
tsink_rules_evaluated_totalcounterRules evaluated
tsink_rules_evaluation_failures_totalcounterRules evaluations that errored
tsink_rules_recording_rows_written_totalcounterSamples written by recording rules
tsink_rules_scheduler_activegauge1 when this node is the active rules scheduler
tsink_rules_configured{kind}gaugeConfigured groups, rules, pending alerts, firing alerts
tsink_rules_runtime_limits{kind}gaugeScheduler tick interval and per-evaluation limits

Exemplars

MetricTypeDescription
tsink_exemplars_accepted_totalcounterExemplars accepted
tsink_exemplars_rejected_totalcounterExemplars rejected
tsink_exemplars_dropped_totalcounterExemplars dropped due to retention guardrails
tsink_exemplars_query_requests_totalcounterExemplar query requests served
tsink_exemplars_query_series_totalcounterExemplar series returned by queries
tsink_exemplars_query_results_totalcounterExemplars returned by queries
tsink_exemplars_stored{kind}gaugeCurrently stored series and exemplars
tsink_exemplar_limits{kind}gaugeConfigured exemlar quotas and guardrails

Ingest protocols

Prometheus remote write

MetricTypeDescription
tsink_prometheus_payload_feature_enabled{payload}gaugeFeature flag per payload kind (metadata, exemplar, histogram)
tsink_prometheus_payload_accepted_total{payload}counterPayloads accepted per kind
tsink_prometheus_payload_rejected_total{payload}counterPayloads rejected per kind

OTLP

MetricTypeDescription
tsink_otlp_metrics_enabledgaugeOTLP metrics ingest enabled (1) or not (0)
tsink_otlp_requests_total{outcome}counterOTLP /v1/metrics requests, labeled accepted or rejected
tsink_otlp_data_points_total{kind,outcome}counterOTLP data points by metric kind and outcome
tsink_otlp_exemplars_total{outcome}counterOTLP exemplars by outcome
tsink_otlp_supported_shape{shape}gauge1 for each supported OTLP metric shape

Legacy ingest (StatsD, Graphite, InfluxDB)

MetricTypeDescription
tsink_legacy_ingest_enabled{adapter}gauge1 for each enabled legacy adapter

Admission control

Write and read admission are tracked independently.

Write admission

MetricTypeDescription
tsink_write_admission_rejections_totalcounterTotal public write admission rejections
tsink_write_admission_request_slot_rejections_totalcounterRejections due to concurrency saturation
tsink_write_admission_row_budget_rejections_totalcounterRejections due to in-flight row saturation
tsink_write_admission_oversize_rows_rejections_totalcounterRejections for requests exceeding the row budget
tsink_write_admission_acquire_wait_nanoseconds_totalcounterWait time acquiring admission permits
tsink_write_admission_active_requestsgaugeActive requests holding admission slots
tsink_write_admission_active_rowsgaugeActive rows reserved against admission budget

Read admission

MetricTypeDescription
tsink_read_admission_rejections_totalcounterTotal public read admission rejections
tsink_read_admission_request_slot_rejections_totalcounterRejections due to concurrency saturation
tsink_read_admission_query_budget_rejections_totalcounterRejections due to in-flight query saturation
tsink_read_admission_oversize_queries_rejections_totalcounterRejections for requests exceeding the query budget
tsink_read_admission_acquire_wait_nanoseconds_totalcounterWait time acquiring admission permits
tsink_read_admission_active_requestsgaugeActive requests holding admission slots

Per-tenant admission

Tenant admission metrics carry a tenant label when multi-tenancy is enabled.

Edge sync

Edge sync ships writes queued on edge/source nodes upstream. Metrics are emitted per role.
MetricTypeDescription
tsink_edge_sync_enabled{role}gaugeSource and accept mode enablement
tsink_edge_sync_queue{kind}gaugeBacklog entries, bytes, log size, oldest age, and retention window
tsink_edge_sync_events_total{event}counterEnqueue, replay, and retention-drop events
tsink_edge_sync_replayed_rows_totalcounterRows replayed upstream
tsink_edge_sync_accept_dedupe{...}gauge/counterAccept-side idempotency window state

Cluster — write routing

MetricTypeDescription
tsink_cluster_write_requests_totalcounterWrite requests routed through the coordinator
tsink_cluster_write_local_rows_totalcounterRows inserted locally
tsink_cluster_write_routed_rows_totalcounterRows forwarded to remote owners
tsink_cluster_write_routed_batches_totalcounterRemote write batches sent
tsink_cluster_write_failures_totalcounterWrite routing failures
tsink_cluster_write_shard_rows_total{shard}counterRows routed per shard
tsink_cluster_write_peer_routed_rows_total{node_id}counterRows routed per peer
tsink_cluster_write_peer_routed_batches_total{node_id}counterBatches routed per peer
tsink_cluster_write_remote_requests_total{node_id}counterRemote write RPC requests per peer
tsink_cluster_write_remote_failures_total{node_id}counterRemote write RPC failures per peer
tsink_cluster_write_remote_request_duration_seconds{node_id,le}histogramRemote write RPC latency per peer

Cluster — deduplication

MetricTypeDescription
tsink_cluster_dedupe_requests_totalcounterIdempotency key checks
tsink_cluster_dedupe_accepted_totalcounterRequests accepted for dedupe tracking
tsink_cluster_dedupe_duplicates_totalcounterRequests deduplicated
tsink_cluster_dedupe_inflight_rejections_totalcounterConflicts while key is in-flight
tsink_cluster_dedupe_commits_totalcounterDedupe marker commits
tsink_cluster_dedupe_aborts_totalcounterDedupe marker aborts
tsink_cluster_dedupe_cleanup_runs_totalcounterCleanup runs
tsink_cluster_dedupe_expired_keys_totalcounterKeys expired by TTL
tsink_cluster_dedupe_evicted_keys_totalcounterKeys evicted by size bound
tsink_cluster_dedupe_persistence_failures_totalcounterDedupe marker persistence failures
tsink_cluster_dedupe_active_keysgaugeActive dedupe keys in window
tsink_cluster_dedupe_inflight_keysgaugeIn-flight dedupe keys
tsink_cluster_dedupe_log_bytesgaugeDurable dedupe marker log size on disk

Cluster — read fanout

MetricTypeDescription
tsink_cluster_fanout_requests_totalcounterRead fanout requests
tsink_cluster_fanout_failures_totalcounterRead fanout failures
tsink_cluster_fanout_duration_nanoseconds_totalcounterFanout execution time
tsink_cluster_fanout_remote_requests_totalcounterRemote RPC requests
tsink_cluster_fanout_remote_failures_totalcounterRemote RPC failures
tsink_cluster_fanout_resource_rejections_totalcounterGuardrail rejections
tsink_cluster_fanout_resource_acquire_wait_nanoseconds_totalcounterWait time for query permits
tsink_cluster_fanout_resource_active_queriesgaugeActive distributed reads holding permits
tsink_cluster_fanout_resource_active_merged_pointsgaugeMerged-point budget in use
tsink_cluster_fanout_operation_requests_total{operation}counterFanout requests per operation
tsink_cluster_fanout_operation_failures_total{operation}counterFanout failures per operation
tsink_cluster_fanout_remote_requests_by_peer_total{node_id,operation}counterRPC requests per peer and operation
tsink_cluster_fanout_remote_failures_by_peer_total{node_id,operation}counterRPC failures per peer and operation
tsink_cluster_fanout_remote_request_duration_seconds{node_id,operation,le}histogramRPC latency per peer and operation

Cluster — read planner

MetricTypeDescription
tsink_cluster_read_planner_requests_totalcounterRead planner requests
tsink_cluster_read_planner_candidate_shards_totalcounterCandidate shards evaluated
tsink_cluster_read_planner_pruned_shards_totalcounterShards pruned
tsink_cluster_read_planner_local_shards_totalcounterLocal shards selected
tsink_cluster_read_planner_remote_targets_totalcounterRemote peer targets selected
tsink_cluster_read_planner_remote_shards_totalcounterRemote shard assignments
tsink_cluster_read_planner_operation_requests_total{operation}counterPlanner requests per operation
tsink_cluster_read_planner_operation_candidate_shards_total{operation}counterCandidate shards per operation
tsink_cluster_read_planner_operation_pruned_shards_total{operation}counterPruned shards per operation
tsink_cluster_read_planner_operation_remote_targets_total{operation}counterRemote targets per operation

Cluster — hinted handoff (outbox)

MetricTypeDescription
tsink_cluster_outbox_enqueued_totalcounterReplica batches enqueued
tsink_cluster_outbox_enqueue_rejected_totalcounterEnqueue rejections due to quota limits
tsink_cluster_outbox_persistence_failures_totalcounterOutbox persistence failures
tsink_cluster_outbox_replay_attempts_totalcounterOutbox replay attempts
tsink_cluster_outbox_replay_success_totalcounterSuccessful replays
tsink_cluster_outbox_replay_failures_totalcounterFailed replays
tsink_cluster_outbox_queued_entriesgaugePending outbox entries
tsink_cluster_outbox_queued_bytesgaugePending outbox bytes
tsink_cluster_outbox_log_bytesgaugeOutbox log file size
tsink_cluster_outbox_stale_recordsgaugeStale log records pending cleanup
tsink_cluster_outbox_stalled_peersgaugePeers currently stalled
tsink_cluster_outbox_stalled_oldest_age_millisecondsgaugeOldest stalled peer backlog age
tsink_cluster_outbox_cleanup_runs_totalcounterCleanup worker iterations
tsink_cluster_outbox_cleanup_compactions_totalcounterCleanup-triggered compactions
tsink_cluster_outbox_cleanup_reclaimed_bytes_totalcounterBytes reclaimed by cleanup
tsink_cluster_outbox_cleanup_failures_totalcounterCleanup failures
tsink_cluster_outbox_stalled_alerts_totalcounterStalled-peer alert transitions
tsink_cluster_outbox_peer_queued_entries{node_id}gaugePending entries per peer
tsink_cluster_outbox_peer_queued_bytes{node_id}gaugePending bytes per peer
tsink_cluster_outbox_peer_stalled{node_id}gauge1 when a peer is stalled

Cluster — control plane

MetricTypeDescription
tsink_cluster_control_current_termgaugeCurrent consensus term
tsink_cluster_control_commit_indexgaugeCurrent committed control-log index
tsink_cluster_control_leader_stalegauge1 if the current leader is considered stale
tsink_cluster_control_leader_contact_age_msgaugeMilliseconds since last leader heartbeat
tsink_cluster_control_suspect_peersgaugePeers currently marked suspect
tsink_cluster_control_dead_peersgaugePeers currently marked dead
tsink_cluster_control_peer_status{node_id,status}gaugeOne-hot liveness status per peer (unknown, healthy, suspect, dead)
tsink_cluster_control_peer_last_success_unix_ms{node_id}gaugeLast successful heartbeat per peer
tsink_cluster_control_peer_last_failure_unix_ms{node_id}gaugeLast failed attempt per peer
tsink_cluster_control_peer_consecutive_failures{node_id}gaugeConsecutive failures per peer

Security & RBAC

MetricTypeDescription
tsink_secret_rotation_generation{target}gaugeCurrent rotation generation per secret target
tsink_secret_rotation_reload_total{target}counterSecret reload operations per target
tsink_secret_rotation_total{target}counterSecret rotation operations per target
tsink_secret_rotation_failures_total{target}counterFailures per target
tsink_secret_rotation_last_success_unix_ms{target}gaugeLast successful reload or rotation (unix ms)
tsink_secret_rotation_last_failure_unix_ms{target}gaugeLast failure (unix ms)
tsink_secret_rotation_previous_credential_active{target}gauge1 during overlap grace window
tsink_rbac_service_accounts_totalgaugeConfigured RBAC service accounts
tsink_rbac_service_accounts_disabledgaugeDisabled RBAC service accounts
tsink_rbac_service_accounts_last_rotated_unix_msgaugeLatest service-account rotation timestamp

Usage accounting

MetricTypeDescription
tsink_usage_ledger_records_totalgaugeDurable or in-memory tenant usage ledger records
tsink_usage_ledger_tenants_totalgaugeDistinct tenants in the usage ledger
tsink_usage_ledger_storage_reconciliations_totalcounterStorage reconciliation snapshots recorded
tsink_usage_ledger_durablegauge1 when the ledger is backed by a durable on-disk store

Alert recommendations

The following metrics are good starting points for alerts:
ConcernMetricThreshold guidance
WAL errorstsink_wal_append_errors_totalRate > 0 for 1 minute
WAL replay errorstsink_wal_replay_errors_totalAny increase
Flush errorstsink_flush_pipeline_errors_totalRate > 0 for 2 minutes
Persist errorstsink_flush_persist_errors_totalRate > 0
Compaction errorstsink_compaction_errors_totalRate > 0
Memory pressuretsink_memory_used_bytes / tsink_memory_budget_bytes> 0.90
Write admission rejectionstsink_write_admission_rejections_totalRate sustained > 0
Read admission rejectionstsink_read_admission_rejections_totalRate sustained > 0
Object-store inaccessibletsink_remote_storage_accessible== 0 for 2 minutes
Remote catalog backofftsink_remote_storage_catalog_refresh_consecutive_failures> 3
Dead cluster peerstsink_cluster_control_dead_peers> 0
Leader staletsink_cluster_control_leader_stale== 1 for 1 minute
Hinted handoff stalledtsink_cluster_outbox_stalled_peers> 0
Secret rotation failuretsink_secret_rotation_failures_totalAny increase
Rollup lagtsink_rollup_policy_status{kind="lag"}> acceptable lag threshold

Support bundles

GET /api/v1/admin/support_bundle?tenant=<id>
Downloads a bounded JSON diagnostic snapshot for a single tenant. Requires the admin:read RBAC permission. The response is returned as a downloadable .json file with a Content-Disposition: attachment header. The bundle includes:
SectionContents
statusTsdbTSDB status endpoint snapshot
usageTenant usage accounting summary
rbacStateLive RBAC roles, service accounts, and OIDC mappings
rbacAuditLast 50 RBAC decision and reload audit entries
securityStateSecret rotation and TLS state
clusterAuditLast 50 cluster admin mutation log entries
clusterHandoffCluster handoff progress
clusterRepairCluster repair progress
clusterRebalanceShard rebalance progress
rulesRules engine status
rollupsRollup policy freshness and coverage
curl -H "Authorization: Bearer $TOKEN" \
  'http://127.0.0.1:9201/api/v1/admin/support_bundle?tenant=default' \
  -o tsink-support-bundle.json