Storage and Time-Series Boundary
DFS uses different storage paths for governed datasets, current operational values, and high-rate time-series history. Use this page when planning customer-side deployments, connector onboarding, AI Engine pipelines, BI datasets, or predictive-maintenance signal history.
The goal is simple: operational reads should be tenant-scoped, repeatable, and traceable, while high-rate history uses storage designed for sustained ingest.
Storage model
| Data class | Typical store | Purpose |
|---|---|---|
| DFS dataset metadata | FactVerse backend database | Dataset ownership, schema, lineage, lifecycle, storage contract, steward state. |
| Materialized dataset rows | Backend-owned table or approved external location | Preview, profile, BI query, fusion input, AI Agent evidence. |
| DFS Lite staging rows | Bounded staging table | Recently mapped point rows awaiting promotion. |
| Current values | Current-value read model | Latest point value used by dashboards, asset context, and operational review. |
| Bounded trend values | Current-value history read model | Short trend windows for operational screens. |
| High-rate raw telemetry | ClickHouse when enabled by the deployment | Long-running telemetry history, rollups, and high-rate analytics. |
| Pipeline outputs | Backend-owned materialization contract | AI Engine or pipeline results published as governed datasets. |
Dataset storage contract
A materialized dataset should carry a storage contract that explains where rows live and whether DFS preview, profile, and BI query can use it.
Important contract fields include:
| Field | Meaning |
|---|---|
physicalLocationType | Physical table, external URI, or metadata-only dataset. |
physicalTableName | Table used by preview, profile, or BI when present. |
physicalTableScope | Shared table with tenant column, tenant-owned table, metadata-only, or unverified existing table. |
tenantColumnName | Tenant column used for scoped reads, normally tenant_id. |
tenantPredicateMode | Tenant predicate shape used by preview, profile, and BI. |
previewEligible and profileEligible | Whether DFS preview and profile can read the dataset. |
biEligible | Whether BI dataset query can use the dataset. |
Shared materialized tables should include tenant_id. An existing table without tenant scope should remain outside shared BI and production AI workflows until the owner classifies or remediates it.
Current-value lifecycle
DFS Lite point sync writes mapped rows into a staging buffer. Promotion then updates:
- the current-value read model for latest-value reads;
- the bounded current-value history model for short trend windows.
Rows missing connector identity, mapped entity, or mapped field should stay visible as skipped promotion records so the source owner can repair mapping quality.
Use current-value reads for:
- latest equipment context;
- facility or data center dashboards;
- field triage;
- short operational trend review;
- AI Agent context that needs current state.
Use high-rate history storage for:
- long lookback windows;
- predictive maintenance training data;
- high-frequency telemetry analytics;
- retention periods beyond the bounded trend window.
High-rate time-series storage
For sustained telemetry ingest, plan a high-rate storage path. A ClickHouse-backed deployment can own raw telemetry history and rollups while the backend database keeps metadata, staging, current values, governance, and contracts.
Plan these items before production onboarding:
| Area | Planning question |
|---|---|
| Ingest rate | Expected events per second, average bytes per event, burst behavior, and source schedule. |
| Retention | Raw retention, minute/hour/day rollups, customer data-retention policy, and backup scope. |
| Queue behavior | Pending, retry, sent, and dead-letter handling for telemetry writes. |
| Load budget | Whether projected 48-hour data growth fits the approved storage budget. |
| Tenant isolation | Tenant-scoped query shape, negative read checks, and role boundaries. |
| Operations | Monitoring, storage growth alerts, replay procedure, and incident closeout evidence. |
High-rate telemetry should pass a load gate before scheduled source onboarding. A failing load gate means the source contract needs throttling, rollup, retention changes, or capacity changes before production use.
Pipeline output materialization
AI Engine and data pipelines should publish DFS outputs through a backend-owned materialization contract when the output becomes a governed dataset. The contract should carry:
- pipeline, run, and node identity;
- tenant identity;
- row and column counts;
- column schema;
- bounded inline records when allowed;
- storage contract metadata;
- downstream dataset or warehouse reference returned by the backend.
This keeps dataset lifecycle, BI eligibility, tenant scoping, and audit ownership in the platform layer.
Validation checklist
- Dataset storage contracts are present for materialized datasets used by preview, profile, BI, or AI Agent workflows.
- Shared physical tables include the tenant column required by the storage contract.
- Legacy unverified tables are excluded from shared BI and production AI workflows.
- Current-value and trend reads come from promoted read models, not from an unbounded staging buffer.
- High-rate telemetry has a storage owner, retention policy, durable queue behavior, and load-budget check.
- Dead-letter telemetry rows have an operator review and replay process.
- AI Engine outputs that become governed datasets use a backend-owned materialization contract.
- Customer deployment capacity includes database I/O, ClickHouse or equivalent telemetry storage, queue capacity, backup scope, and monitoring.
Troubleshooting
| Symptom | Check |
|---|---|
| Dataset preview is blocked | Storage contract, tenant column, table scope, dataset status, and user permission. |
| BI cannot query a dataset | biEligible, tenant-scoped table, column whitelist, and dataset validation state. |
| Latest values are stale | Connector sync, staging promotion status, skipped promotion reasons, and cleanup policy. |
| Trend window is too short | Current-value history retention, product expectation, and high-rate history requirement. |
| Telemetry queue grows | ClickHouse availability, retry count, dead-letter reason, ingest rate, and storage budget. |
| Pipeline output collides with a table | Use backend-owned materialization and treat direct table replacement as an exception requiring project approval. |