Skip to main content

DFS Pro Datasets

A DFS Pro dataset is a governed data asset. It can come from a connector, a file import, a subset extraction, or a fusion output.

Use datasets when data needs to be reusable, reviewed, profiled, versioned, or consumed by AI Agent, predictive maintenance, BI, or operational workflows.

Prerequisites

Before creating or promoting a dataset, confirm:

  • DFS Pro package access is enabled;
  • the source owner and data steward are known;
  • source type and source ID are available;
  • time, identity, metric, and required columns are understood;
  • downstream consumers have been identified;
  • quality expectations and refresh cadence are documented.

Dataset workflow

Open Dataset Center

Go to:

Data Integration > Dataset Center

Dataset Center shows available datasets, source type, row count, column count, tags, and update time.

Dataset source types

Source typeUse
File importData uploaded or imported from a file.
ConnectorData produced by a DFS Lite connector.
Subset extractionA filtered or selected subset from another dataset.
Fusion outputOutput from a DFS Pro fusion task.

Source contract and readiness

When a dataset comes from DFS Lite or another governed source, carry the source contract into Dataset Center. The contract should make the reuse boundary explicit:

Contract itemGuidance
Source ownerName the team or role that can explain source behavior and approve schema changes.
Refresh cadenceRecord whether the data is one-time, scheduled, event-driven, or manually refreshed.
Required fieldsIdentify identity, time, metric, status, and evidence fields that downstream workflows depend on.
Quality gateDefine the preview, profile, null ratio, distinct ratio, and failed-row checks required before reuse.
Consumer scopeState whether the dataset feeds fusion tasks, MDM, BI, AI Agent, Inspector, or another application.
Readiness statusKeep the dataset out of shared production workflows until the required owner and quality checks are complete.

Use this contract to decide whether the dataset can be selected by a fusion task, published as a data product, or handed to an AI workflow.

Create a dataset

  1. Open Dataset Center.
  2. Select Create Dataset.
  3. Enter a dataset name.
  4. Add a description that explains the data owner, source, and purpose.
  5. Choose source type.
  6. Save the dataset.
  7. Open the dataset detail page.

Use names that remain useful outside the original project meeting.

Examples:

  • Chiller plant sensor history
  • CMMS work orders normalized
  • Inspection findings by asset
  • Compressor vibration features

Inspect a dataset

On Dataset Detail, review:

  • name and description;
  • source type and source ID;
  • table name when present;
  • row count;
  • column count;
  • column schema;
  • time column;
  • metric column;
  • tags;
  • quality issues;
  • status;
  • steward;
  • current version.

Use preview to inspect sample rows. Use profile to inspect column-level statistics such as row count, null ratio, and distinct ratio.

Import, reprocess, and rejected rows

File and connector-backed datasets can produce accepted rows and rejected rows. Rejected rows are operational feedback for source owners, connector owners, and downstream workflow owners.

Common checks:

  • schema fields are present and named as expected;
  • required identity and time columns are populated;
  • numeric, date, and status values can be parsed;
  • duplicate rows are explainable;
  • row-level errors are visible to the source owner;
  • the accepted-row count and rejected-row count match the import expectation.

When the source owner fixes the input, reprocess the corrected file or source slice and compare the run totals. Keep the earlier failed-run evidence until downstream owners agree that the replacement dataset is safe to use.

Assign a steward

Assign a steward when a dataset will be used beyond a one-time test.

The steward should be able to answer:

  • where the data came from;
  • which source system owns it;
  • how often it should update;
  • which columns are required;
  • which quality issues block downstream use;
  • who approves schema changes.

Validate a dataset

Validate a dataset after preview, profile, and source ownership are clear.

Validation checklist:

  • dataset name and description are clear;
  • source type and source ID are correct;
  • time column is correct when the dataset is time-based;
  • metric column is correct when the dataset is metric-based;
  • required columns are present;
  • null and distinct ratios are understood;
  • quality issues are reviewed;
  • steward is assigned;
  • downstream consumers are known.

After validation, the dataset can be used as a more reliable input for fusion tasks, AI Agent workflows, predictive maintenance, or BI reports.

Use version history

Dataset version history helps review schema changes. Use it when:

  • columns are added or removed;
  • column names change;
  • data types change;
  • a downstream dashboard, fusion task, or AI workflow depends on the dataset.

Before making a breaking change, check change impact so downstream users can review the effect.

Use lineage

Lineage shows upstream producers and downstream consumers. Use it before changing or deprecating a dataset.

Ask:

  • Which fusion tasks use this dataset?
  • Which reports use this dataset?
  • Which AI Agent workflows depend on it?
  • Which output datasets were produced from it?

Data product readiness

A data product is a dataset with enough ownership, quality, lineage, and consumer context for repeated use.

Before publishing or handing off a data product, confirm:

  • the dataset owner and steward are recorded;
  • source contract and refresh cadence are known;
  • profile and quality checks are current;
  • lineage shows the upstream source and downstream consumers;
  • MDM entity IDs or reviewed event IDs are included when the workflow depends on governed identity;
  • AI Agent, BI, Inspector, or maintenance workflows have a clear consumption path;
  • known limitations and open exceptions are documented.

For AI Agent workflows, hand off the dataset with the intended question scope, evidence fields, identity fields, refresh cadence, and review owner. This lets the workflow retrieve operational evidence without treating raw source rows as approved knowledge.

Deprecate a dataset

Deprecate a dataset when a better dataset replaces it or when the source is retired.

Before deprecating:

  1. Check lineage.
  2. Notify downstream owners.
  3. Confirm replacement dataset when needed.
  4. Record why the dataset is being deprecated.

Failure handling

If a dataset cannot be validated, keep it out of shared fusion tasks, BI reports, and AI Agent workflows until the owner resolves the issue.

FailureAction
Preview unavailableConfirm source import, connector run, table creation, and dataset permissions.
Required columns missingReturn to source owner, connector mapping, or fusion task configuration.
Null or distinct ratios are unexpectedReview source extraction, filters, deduplication, and source-system changes.
Steward missingAssign an accountable owner before reuse.
Downstream dependency blocks deprecationKeep the dataset active until replacement and consumer migration are confirmed.

Next step

Use Fusion Tasks when multiple datasets need to be combined or compared.