Skip to main content

Storage and Time-Series Boundary

DFS uses different storage paths for governed datasets, current operational values, and high-rate time-series history. Use this page when planning customer-side deployments, connector onboarding, AI Engine pipelines, BI datasets, or predictive-maintenance signal history.

The goal is simple: operational reads should be tenant-scoped, repeatable, and traceable, while high-rate history uses storage designed for sustained ingest.

Storage model

Data classTypical storePurpose
DFS dataset metadataFactVerse backend databaseDataset ownership, schema, lineage, lifecycle, storage contract, steward state.
Materialized dataset rowsBackend-owned table or approved external locationPreview, profile, BI query, fusion input, AI Agent evidence.
DFS Lite staging rowsBounded staging tableRecently mapped point rows awaiting promotion.
Current valuesCurrent-value read modelLatest point value used by dashboards, asset context, and operational review.
Bounded trend valuesCurrent-value history read modelShort trend windows for operational screens.
High-rate raw telemetryClickHouse when enabled by the deploymentLong-running telemetry history, rollups, and high-rate analytics.
Pipeline outputsBackend-owned materialization contractAI Engine or pipeline results published as governed datasets.

Dataset storage contract

A materialized dataset should carry a storage contract that explains where rows live and whether DFS preview, profile, and BI query can use it.

Important contract fields include:

FieldMeaning
physicalLocationTypePhysical table, external URI, or metadata-only dataset.
physicalTableNameTable used by preview, profile, or BI when present.
physicalTableScopeShared table with tenant column, tenant-owned table, metadata-only, or unverified existing table.
tenantColumnNameTenant column used for scoped reads, normally tenant_id.
tenantPredicateModeTenant predicate shape used by preview, profile, and BI.
previewEligible and profileEligibleWhether DFS preview and profile can read the dataset.
biEligibleWhether BI dataset query can use the dataset.

Shared materialized tables should include tenant_id. An existing table without tenant scope should remain outside shared BI and production AI workflows until the owner classifies or remediates it.

Current-value lifecycle

DFS Lite point sync writes mapped rows into a staging buffer. Promotion then updates:

  • the current-value read model for latest-value reads;
  • the bounded current-value history model for short trend windows.

Rows missing connector identity, mapped entity, or mapped field should stay visible as skipped promotion records so the source owner can repair mapping quality.

Use current-value reads for:

  • latest equipment context;
  • facility or data center dashboards;
  • field triage;
  • short operational trend review;
  • AI Agent context that needs current state.

Use high-rate history storage for:

  • long lookback windows;
  • predictive maintenance training data;
  • high-frequency telemetry analytics;
  • retention periods beyond the bounded trend window.

High-rate time-series storage

For sustained telemetry ingest, plan a high-rate storage path. A ClickHouse-backed deployment can own raw telemetry history and rollups while the backend database keeps metadata, staging, current values, governance, and contracts.

Plan these items before production onboarding:

AreaPlanning question
Ingest rateExpected events per second, average bytes per event, burst behavior, and source schedule.
RetentionRaw retention, minute/hour/day rollups, customer data-retention policy, and backup scope.
Queue behaviorPending, retry, sent, and dead-letter handling for telemetry writes.
Load budgetWhether projected 48-hour data growth fits the approved storage budget.
Tenant isolationTenant-scoped query shape, negative read checks, and role boundaries.
OperationsMonitoring, storage growth alerts, replay procedure, and incident closeout evidence.

High-rate telemetry should pass a load gate before scheduled source onboarding. A failing load gate means the source contract needs throttling, rollup, retention changes, or capacity changes before production use.

Pipeline output materialization

AI Engine and data pipelines should publish DFS outputs through a backend-owned materialization contract when the output becomes a governed dataset. The contract should carry:

  • pipeline, run, and node identity;
  • tenant identity;
  • row and column counts;
  • column schema;
  • bounded inline records when allowed;
  • storage contract metadata;
  • downstream dataset or warehouse reference returned by the backend.

This keeps dataset lifecycle, BI eligibility, tenant scoping, and audit ownership in the platform layer.

Validation checklist

  • Dataset storage contracts are present for materialized datasets used by preview, profile, BI, or AI Agent workflows.
  • Shared physical tables include the tenant column required by the storage contract.
  • Legacy unverified tables are excluded from shared BI and production AI workflows.
  • Current-value and trend reads come from promoted read models, not from an unbounded staging buffer.
  • High-rate telemetry has a storage owner, retention policy, durable queue behavior, and load-budget check.
  • Dead-letter telemetry rows have an operator review and replay process.
  • AI Engine outputs that become governed datasets use a backend-owned materialization contract.
  • Customer deployment capacity includes database I/O, ClickHouse or equivalent telemetry storage, queue capacity, backup scope, and monitoring.

Troubleshooting

SymptomCheck
Dataset preview is blockedStorage contract, tenant column, table scope, dataset status, and user permission.
BI cannot query a datasetbiEligible, tenant-scoped table, column whitelist, and dataset validation state.
Latest values are staleConnector sync, staging promotion status, skipped promotion reasons, and cleanup policy.
Trend window is too shortCurrent-value history retention, product expectation, and high-rate history requirement.
Telemetry queue growsClickHouse availability, retry count, dead-letter reason, ingest rate, and storage budget.
Pipeline output collides with a tableUse backend-owned materialization and treat direct table replacement as an exception requiring project approval.