DFS Pro Datasets
A DFS Pro dataset is a governed data asset. It can come from a connector, a file import, a subset extraction, or a fusion output.
Use datasets when data needs to be reusable, reviewed, profiled, versioned, or consumed by AI Agent, predictive maintenance, BI, or operational workflows.
Prerequisites
Before creating or promoting a dataset, confirm:
- DFS Pro package access is enabled;
- the source owner and data steward are known;
- source type and source ID are available;
- time, identity, metric, and required columns are understood;
- downstream consumers have been identified;
- quality expectations and refresh cadence are documented.
Dataset workflow
Open Dataset Center
Go to:
Data Integration > Dataset Center
Dataset Center shows available datasets, source type, row count, column count, tags, and update time.
Dataset source types
| Source type | Use |
|---|---|
| File import | Data uploaded or imported from a file. |
| Connector | Data produced by a DFS Lite connector. |
| Subset extraction | A filtered or selected subset from another dataset. |
| Fusion output | Output from a DFS Pro fusion task. |
Source contract and readiness
When a dataset comes from DFS Lite or another governed source, carry the source contract into Dataset Center. The contract should make the reuse boundary explicit:
| Contract item | Guidance |
|---|---|
| Source owner | Name the team or role that can explain source behavior and approve schema changes. |
| Refresh cadence | Record whether the data is one-time, scheduled, event-driven, or manually refreshed. |
| Required fields | Identify identity, time, metric, status, and evidence fields that downstream workflows depend on. |
| Quality gate | Define the preview, profile, null ratio, distinct ratio, and failed-row checks required before reuse. |
| Consumer scope | State whether the dataset feeds fusion tasks, MDM, BI, AI Agent, Inspector, or another application. |
| Readiness status | Keep the dataset out of shared production workflows until the required owner and quality checks are complete. |
Use this contract to decide whether the dataset can be selected by a fusion task, published as a data product, or handed to an AI workflow.
Create a dataset
- Open Dataset Center.
- Select Create Dataset.
- Enter a dataset name.
- Add a description that explains the data owner, source, and purpose.
- Choose source type.
- Save the dataset.
- Open the dataset detail page.
Use names that remain useful outside the original project meeting.
Examples:
Chiller plant sensor historyCMMS work orders normalizedInspection findings by assetCompressor vibration features
Inspect a dataset
On Dataset Detail, review:
- name and description;
- source type and source ID;
- table name when present;
- row count;
- column count;
- column schema;
- time column;
- metric column;
- tags;
- quality issues;
- status;
- steward;
- current version.
Use preview to inspect sample rows. Use profile to inspect column-level statistics such as row count, null ratio, and distinct ratio.
Import, reprocess, and rejected rows
File and connector-backed datasets can produce accepted rows and rejected rows. Rejected rows are operational feedback for source owners, connector owners, and downstream workflow owners.
Common checks:
- schema fields are present and named as expected;
- required identity and time columns are populated;
- numeric, date, and status values can be parsed;
- duplicate rows are explainable;
- row-level errors are visible to the source owner;
- the accepted-row count and rejected-row count match the import expectation.
When the source owner fixes the input, reprocess the corrected file or source slice and compare the run totals. Keep the earlier failed-run evidence until downstream owners agree that the replacement dataset is safe to use.
Assign a steward
Assign a steward when a dataset will be used beyond a one-time test.
The steward should be able to answer:
- where the data came from;
- which source system owns it;
- how often it should update;
- which columns are required;
- which quality issues block downstream use;
- who approves schema changes.
Validate a dataset
Validate a dataset after preview, profile, and source ownership are clear.
Validation checklist:
- dataset name and description are clear;
- source type and source ID are correct;
- time column is correct when the dataset is time-based;
- metric column is correct when the dataset is metric-based;
- required columns are present;
- null and distinct ratios are understood;
- quality issues are reviewed;
- steward is assigned;
- downstream consumers are known.
After validation, the dataset can be used as a more reliable input for fusion tasks, AI Agent workflows, predictive maintenance, or BI reports.
Use version history
Dataset version history helps review schema changes. Use it when:
- columns are added or removed;
- column names change;
- data types change;
- a downstream dashboard, fusion task, or AI workflow depends on the dataset.
Before making a breaking change, check change impact so downstream users can review the effect.
Use lineage
Lineage shows upstream producers and downstream consumers. Use it before changing or deprecating a dataset.
Ask:
- Which fusion tasks use this dataset?
- Which reports use this dataset?
- Which AI Agent workflows depend on it?
- Which output datasets were produced from it?
Data product readiness
A data product is a dataset with enough ownership, quality, lineage, and consumer context for repeated use.
Before publishing or handing off a data product, confirm:
- the dataset owner and steward are recorded;
- source contract and refresh cadence are known;
- profile and quality checks are current;
- lineage shows the upstream source and downstream consumers;
- MDM entity IDs or reviewed event IDs are included when the workflow depends on governed identity;
- AI Agent, BI, Inspector, or maintenance workflows have a clear consumption path;
- known limitations and open exceptions are documented.
For AI Agent workflows, hand off the dataset with the intended question scope, evidence fields, identity fields, refresh cadence, and review owner. This lets the workflow retrieve operational evidence without treating raw source rows as approved knowledge.
Deprecate a dataset
Deprecate a dataset when a better dataset replaces it or when the source is retired.
Before deprecating:
- Check lineage.
- Notify downstream owners.
- Confirm replacement dataset when needed.
- Record why the dataset is being deprecated.
Failure handling
If a dataset cannot be validated, keep it out of shared fusion tasks, BI reports, and AI Agent workflows until the owner resolves the issue.
| Failure | Action |
|---|---|
| Preview unavailable | Confirm source import, connector run, table creation, and dataset permissions. |
| Required columns missing | Return to source owner, connector mapping, or fusion task configuration. |
| Null or distinct ratios are unexpected | Review source extraction, filters, deduplication, and source-system changes. |
| Steward missing | Assign an accountable owner before reuse. |
| Downstream dependency blocks deprecation | Keep the dataset active until replacement and consumer migration are confirmed. |
Next step
Use Fusion Tasks when multiple datasets need to be combined or compared.