Storage and Time-Series Boundary

DFS uses different storage paths for governed datasets, current operational values, and high-rate time-series history. Use this page when planning customer-side deployments, connector onboarding, AI Engine pipelines, BI datasets, or predictive-maintenance signal history.

The goal is simple: operational reads should be tenant-scoped, repeatable, and traceable, while high-rate history uses storage designed for sustained ingest.

Storage model

Data class	Typical store	Purpose
DFS dataset metadata	FactVerse backend database	Dataset ownership, schema, lineage, lifecycle, storage contract, steward state.
Materialized dataset rows	Backend-owned table or approved external location	Preview, profile, BI query, fusion input, AI Agent evidence.
DFS Lite staging rows	Bounded staging table	Recently mapped point rows awaiting promotion.
Current values	Current-value read model	Latest point value used by dashboards, asset context, and operational review.
Bounded trend values	Current-value history read model	Short trend windows for operational screens.
High-rate raw telemetry	ClickHouse when enabled by the deployment	Long-running telemetry history, rollups, and high-rate analytics.
Pipeline outputs	Backend-owned materialization contract	AI Engine or pipeline results published as governed datasets.

Dataset storage contract

A materialized dataset should carry a storage contract that explains where rows live and whether DFS preview, profile, and BI query can use it.

Important contract fields include:

Field	Meaning
`physicalLocationType`	Physical table, external URI, or metadata-only dataset.
`physicalTableName`	Table used by preview, profile, or BI when present.
`physicalTableScope`	Shared table with tenant column, tenant-owned table, metadata-only, or unverified existing table.
`tenantColumnName`	Tenant column used for scoped reads, normally `tenant_id`.
`tenantPredicateMode`	Tenant predicate shape used by preview, profile, and BI.
`previewEligible` and `profileEligible`	Whether DFS preview and profile can read the dataset.
`biEligible`	Whether BI dataset query can use the dataset.

Shared materialized tables should include tenant_id. An existing table without tenant scope should remain outside shared BI and production AI workflows until the owner classifies or remediates it.

Current-value lifecycle

DFS Lite point sync writes mapped rows into a staging buffer. Promotion then updates:

the current-value read model for latest-value reads;
the bounded current-value history model for short trend windows.

Rows missing connector identity, mapped entity, or mapped field should stay visible as skipped promotion records so the source owner can repair mapping quality.

Use current-value reads for:

latest equipment context;
facility or data center dashboards;
field triage;
short operational trend review;
AI Agent context that needs current state.

Use high-rate history storage for:

long lookback windows;
predictive maintenance training data;
high-frequency telemetry analytics;
retention periods beyond the bounded trend window.

High-rate time-series storage

For sustained telemetry ingest, plan a high-rate storage path. A ClickHouse-backed deployment can own raw telemetry history and rollups while the backend database keeps metadata, staging, current values, governance, and contracts.

Plan these items before production onboarding:

Area	Planning question
Ingest rate	Expected events per second, average bytes per event, burst behavior, and source schedule.
Retention	Raw retention, minute/hour/day rollups, customer data-retention policy, and backup scope.
Queue behavior	Pending, retry, sent, and dead-letter handling for telemetry writes.
Load budget	Whether projected 48-hour data growth fits the approved storage budget.
Tenant isolation	Tenant-scoped query shape, negative read checks, and role boundaries.
Operations	Monitoring, storage growth alerts, replay procedure, and incident closeout evidence.

High-rate telemetry should pass a load gate before scheduled source onboarding. A failing load gate means the source contract needs throttling, rollup, retention changes, or capacity changes before production use.

Pipeline output materialization

AI Engine and data pipelines should publish DFS outputs through a backend-owned materialization contract when the output becomes a governed dataset. The contract should carry:

pipeline, run, and node identity;
tenant identity;
row and column counts;
column schema;
bounded inline records when allowed;
storage contract metadata;
downstream dataset or warehouse reference returned by the backend.

This keeps dataset lifecycle, BI eligibility, tenant scoping, and audit ownership in the platform layer.

Validation checklist

Dataset storage contracts are present for materialized datasets used by preview, profile, BI, or AI Agent workflows.
Shared physical tables include the tenant column required by the storage contract.
Legacy unverified tables are excluded from shared BI and production AI workflows.
Current-value and trend reads come from promoted read models, not from an unbounded staging buffer.
High-rate telemetry has a storage owner, retention policy, durable queue behavior, and load-budget check.
Dead-letter telemetry rows have an operator review and replay process.
AI Engine outputs that become governed datasets use a backend-owned materialization contract.
Customer deployment capacity includes database I/O, ClickHouse or equivalent telemetry storage, queue capacity, backup scope, and monitoring.

Troubleshooting

Symptom	Check
Dataset preview is blocked	Storage contract, tenant column, table scope, dataset status, and user permission.
BI cannot query a dataset	`biEligible`, tenant-scoped table, column whitelist, and dataset validation state.
Latest values are stale	Connector sync, staging promotion status, skipped promotion reasons, and cleanup policy.
Trend window is too short	Current-value history retention, product expectation, and high-rate history requirement.
Telemetry queue grows	ClickHouse availability, retry count, dead-letter reason, ingest rate, and storage budget.
Pipeline output collides with a table	Use backend-owned materialization and treat direct table replacement as an exception requiring project approval.

Storage model​

Dataset storage contract​

Current-value lifecycle​

High-rate time-series storage​

Pipeline output materialization​

Validation checklist​

Troubleshooting​

Related docs​