Skip to main content

Dataset Lifecycle

Use the dataset lifecycle when a data asset will be reused by fusion tasks, AI Agent workflows, predictive maintenance, BI reports, or facility operations teams.

The lifecycle keeps source data reviewable:

Create -> Preview -> Profile -> Assign steward -> Validate -> Monitor changes -> Deprecate when replaced

Before you start

Confirm:

  • dfs:read and dfs:write access;
  • DFS Pro package access;
  • source owner;
  • intended downstream consumers;
  • expected refresh cadence;
  • required time, identity, and metric columns.

Step 1: Create or select a dataset

Go to:

Data Integration > Dataset Center

Create a dataset when the source needs governance. Select an existing dataset when the data is already available and the task is validation, stewardship, or lifecycle review.

Use a name that describes the operational asset, source, and purpose.

Good examples:

  • Facility chilled water sensor history
  • Normalized work orders by asset
  • Inspection findings with asset reference
  • Compressor vibration feature history

Step 2: Open Dataset Detail

Open the dataset and review the summary panel.

Record:

  • source type;
  • source ID;
  • table name when present;
  • row count;
  • column count;
  • status;
  • steward;
  • validated time when available;
  • deprecation time when available.

Step 3: Preview rows

Use the Preview tab to inspect sample rows.

Check:

  • timestamps are parseable;
  • asset or equipment identity is stable;
  • metric names are understandable;
  • values match expected units;
  • rows represent the intended source scope;
  • sensitive fields are suitable for the intended workflow.

If preview is unavailable, confirm whether the dataset has a queryable table and whether the source import or connector run has completed.

Step 4: Profile columns

Use the Profile tab to compute column statistics.

Review:

  • row count;
  • column names;
  • data types;
  • null ratio;
  • distinct values;
  • min and max values when available;
  • computed time.

Profile results help decide whether the dataset is ready for validation or needs source repair.

Step 5: Assign a steward

Assign a steward before validation.

The steward is the person or team responsible for answering:

  • where the data came from;
  • which source system owns it;
  • how often it should refresh;
  • which columns are required;
  • which downstream workflows can use it;
  • which quality issues need source correction.

Use the steward field only after confirming the user ID format expected by the deployment.

Step 6: Validate the dataset

Validate after preview, profile, and stewardship are complete.

Validation checklist:

  • dataset name and description are clear;
  • source owner is known;
  • required columns are present;
  • timestamp and identity columns are correct;
  • quality issues are reviewed;
  • downstream consumers are known;
  • steward is assigned;
  • source refresh cadence is documented.

Validated datasets are better candidates for fusion tasks, BI reports, AI Agent workflows, and predictive maintenance workflows.

Step 7: Review versions

Use the Versions tab when schema or source logic changes.

Review versions before changing:

  • column names;
  • column types;
  • time columns;
  • metric columns;
  • table name;
  • source query;
  • transformation logic.

When a version change affects downstream consumers, record the reason and coordinate with the owners of dependent workflows.

Step 8: Check lineage

Use the Lineage tab before changing or deprecating a dataset.

Look for:

  • upstream fusion task or pipeline;
  • fusion tasks that consume the dataset;
  • downstream governance pipelines;
  • output datasets produced from it.

Lineage is the quickest way to understand blast radius before making lifecycle changes.

Step 9: Deprecate a dataset

Deprecate when a dataset is replaced, retired, or no longer suitable for reuse.

Before deprecating:

  1. Check lineage.
  2. Identify downstream owners.
  3. Confirm the replacement dataset when one exists.
  4. Record the reason.
  5. Notify users before changing reports or workflows.

Deprecation marks the dataset status while preserving review history.

Operating routine

For important datasets, review weekly:

  • latest preview;
  • profile changes;
  • lineage changes;
  • quality issues;
  • downstream complaints;
  • steward assignment;
  • status.

Next step

Use Governance Studio when datasets and methods need a repeatable processing pipeline.