Dataset Lifecycle
Use the dataset lifecycle when a data asset will be reused by fusion tasks, AI Agent workflows, predictive maintenance, BI reports, or facility operations teams.
The lifecycle keeps source data reviewable:
Create -> Preview -> Profile -> Assign steward -> Validate -> Monitor changes -> Deprecate when replaced
Before you start
Confirm:
dfs:readanddfs:writeaccess;- DFS Pro package access;
- source owner;
- intended downstream consumers;
- expected refresh cadence;
- required time, identity, and metric columns.
Step 1: Create or select a dataset
Go to:
Data Integration > Dataset Center
Create a dataset when the source needs governance. Select an existing dataset when the data is already available and the task is validation, stewardship, or lifecycle review.
Use a name that describes the operational asset, source, and purpose.
Good examples:
Facility chilled water sensor historyNormalized work orders by assetInspection findings with asset referenceCompressor vibration feature history
Step 2: Open Dataset Detail
Open the dataset and review the summary panel.
Record:
- source type;
- source ID;
- table name when present;
- row count;
- column count;
- status;
- steward;
- validated time when available;
- deprecation time when available.
Step 3: Preview rows
Use the Preview tab to inspect sample rows.
Check:
- timestamps are parseable;
- asset or equipment identity is stable;
- metric names are understandable;
- values match expected units;
- rows represent the intended source scope;
- sensitive fields are suitable for the intended workflow.
If preview is unavailable, confirm whether the dataset has a queryable table and whether the source import or connector run has completed.
Step 4: Profile columns
Use the Profile tab to compute column statistics.
Review:
- row count;
- column names;
- data types;
- null ratio;
- distinct values;
- min and max values when available;
- computed time.
Profile results help decide whether the dataset is ready for validation or needs source repair.
Step 5: Assign a steward
Assign a steward before validation.
The steward is the person or team responsible for answering:
- where the data came from;
- which source system owns it;
- how often it should refresh;
- which columns are required;
- which downstream workflows can use it;
- which quality issues need source correction.
Use the steward field only after confirming the user ID format expected by the deployment.
Step 6: Validate the dataset
Validate after preview, profile, and stewardship are complete.
Validation checklist:
- dataset name and description are clear;
- source owner is known;
- required columns are present;
- timestamp and identity columns are correct;
- quality issues are reviewed;
- downstream consumers are known;
- steward is assigned;
- source refresh cadence is documented.
Validated datasets are better candidates for fusion tasks, BI reports, AI Agent workflows, and predictive maintenance workflows.
Step 7: Review versions
Use the Versions tab when schema or source logic changes.
Review versions before changing:
- column names;
- column types;
- time columns;
- metric columns;
- table name;
- source query;
- transformation logic.
When a version change affects downstream consumers, record the reason and coordinate with the owners of dependent workflows.
Step 8: Check lineage
Use the Lineage tab before changing or deprecating a dataset.
Look for:
- upstream fusion task or pipeline;
- fusion tasks that consume the dataset;
- downstream governance pipelines;
- output datasets produced from it.
Lineage is the quickest way to understand blast radius before making lifecycle changes.
Step 9: Deprecate a dataset
Deprecate when a dataset is replaced, retired, or no longer suitable for reuse.
Before deprecating:
- Check lineage.
- Identify downstream owners.
- Confirm the replacement dataset when one exists.
- Record the reason.
- Notify users before changing reports or workflows.
Deprecation marks the dataset status while preserving review history.
Operating routine
For important datasets, review weekly:
- latest preview;
- profile changes;
- lineage changes;
- quality issues;
- downstream complaints;
- steward assignment;
- status.
Next step
Use Governance Studio when datasets and methods need a repeatable processing pipeline.