Dataset Lifecycle
Use the dataset lifecycle when a data asset will be reused by fusion tasks, AI Agent workflows, predictive maintenance, BI reports, or facility operations teams.
The lifecycle keeps source data reviewable:
Before you start
Confirm:
dfs:readanddfs:writeaccess;- DFS Pro package access;
- source owner;
- source contract or intake record from DFS Lite when the dataset comes from a connector;
- intended downstream consumers;
- expected refresh cadence;
- required time, identity, and metric columns.
Step 1: Create or select a dataset
Go to:
Data Integration > Dataset Center
Create a dataset when the source needs governance. Select an existing dataset when the data is already available and the task is validation, stewardship, or lifecycle review.
Use a name that describes the operational asset, source, and purpose.
Good examples:
Facility chilled water sensor historyNormalized work orders by assetInspection findings with asset referenceCompressor vibration feature history
When the dataset is promoted from DFS Lite, carry over the source owner, connector reference, refresh cadence, required fields, latest sync evidence, and known source limitations. This gives later reviewers the reason the dataset exists and the conditions under which it can be reused.
Step 2: Open Dataset Detail
Open the dataset and review the summary panel.
Record:
- source type;
- source ID;
- table name when present;
- row count;
- column count;
- status;
- steward;
- validated time when available;
- deprecation time when available.
Step 3: Preview rows
Use the Preview tab to inspect sample rows.
Check:
- timestamps are parseable;
- asset or equipment identity is stable;
- metric names are understandable;
- values match expected units;
- rows represent the intended source scope;
- sensitive fields are suitable for the intended workflow.
If preview is unavailable, confirm whether the dataset has a queryable table and whether the source import or connector run has completed.
Step 4: Profile columns
Use the Profile tab to compute column statistics.
Review:
- row count;
- column names;
- data types;
- null ratio;
- distinct values;
- min and max values when available;
- computed time.
Profile results help decide whether the dataset is ready for validation or needs source repair.
Step 5: Assign a steward
Assign a steward before validation.
The steward is the person or team responsible for answering:
- where the data came from;
- which source system owns it;
- how often it should refresh;
- which columns are required;
- which downstream workflows can use it;
- which quality issues need source correction.
Use the steward field only after confirming the user ID format expected by the deployment.
Step 6: Validate the dataset
Validate after preview, profile, and stewardship are complete.
Validation checklist:
- dataset name and description are clear;
- source owner is known;
- required columns are present;
- timestamp and identity columns are correct;
- quality issues are reviewed;
- downstream consumers are known;
- steward is assigned;
- source refresh cadence is documented.
Validated datasets are better candidates for fusion tasks, BI reports, AI Agent workflows, and predictive maintenance workflows.
Import and reprocess checks
When a dataset is created from a file import, connector snapshot, or reprocessed source slice, compare the run totals before validation:
| Check | What to confirm |
|---|---|
| Accepted rows | The accepted count matches the source owner's expected scope. |
| Rejected rows | Row-level errors are visible and assigned to the right owner for correction. |
| Schema match | Required fields, data types, and column names match the dataset contract. |
| Identity and time fields | Asset, equipment, event, or timestamp fields can support the intended workflow. |
| Reprocess result | A corrected source slice reduces the expected error count and preserves lineage. |
Keep failed import evidence until the replacement dataset has been validated and downstream consumers agree to use it.
Step 7: Review versions
Use the Versions tab when schema or source logic changes.
Review versions before changing:
- column names;
- column types;
- time columns;
- metric columns;
- table name;
- source query;
- transformation logic.
When a version change affects downstream consumers, record the reason and coordinate with the owners of dependent workflows.
Step 8: Check lineage
Use the Lineage tab before changing or deprecating a dataset.
Look for:
- upstream fusion task or pipeline;
- fusion tasks that consume the dataset;
- downstream governance pipelines;
- output datasets produced from it.
Lineage is the quickest way to understand blast radius before making lifecycle changes.
Step 9: Deprecate a dataset
Deprecate when a dataset is replaced, retired, or no longer suitable for reuse.
Before deprecating:
- Check lineage.
- Identify downstream owners.
- Confirm the replacement dataset when one exists.
- Record the reason.
- Notify users before changing reports or workflows.
Deprecation marks the dataset status while preserving review history.
Failure handling
Do not force a dataset through validation when evidence is incomplete.
| Problem | Action |
|---|---|
| Preview does not match source intent | Return to connector, import, or fusion output configuration. |
| Profile shows unexpected nulls or distinct values | Review source extraction, mappings, filters, and source-system changes. |
| Steward cannot be assigned | Keep the dataset in draft or limited pilot use until ownership is clear. |
| Validation fails | Record failed checks and the owner responsible for correction. |
| Deprecation affects reports or workflows | Keep the dataset active until downstream owners confirm migration. |
Operating routine
For important datasets, review weekly:
- latest preview;
- profile changes;
- lineage changes;
- quality issues;
- downstream complaints;
- steward assignment;
- status.
Next step
Use Governance Studio when datasets and methods need a repeatable processing pipeline.