Dataset Lifecycle

Use the dataset lifecycle when a data asset will be reused by fusion tasks, AI Agent workflows, predictive maintenance, BI reports, or facility operations teams.

The lifecycle keeps source data reviewable:

Before you start

Confirm:

dfs:read and dfs:write access;
DFS Pro package access;
source owner;
source contract or intake record from DFS Lite when the dataset comes from a connector;
intended downstream consumers;
expected refresh cadence;
required time, identity, and metric columns.

Step 1: Create or select a dataset

Go to:

Data Integration > Dataset Center

Create a dataset when the source needs governance. Select an existing dataset when the data is already available and the task is validation, stewardship, or lifecycle review.

Use a name that describes the operational asset, source, and purpose.

Good examples:

Facility chilled water sensor history
Normalized work orders by asset
Inspection findings with asset reference
Compressor vibration feature history

When the dataset is promoted from DFS Lite, carry over the source owner, connector reference, refresh cadence, required fields, latest sync evidence, and known source limitations. This gives later reviewers the reason the dataset exists and the conditions under which it can be reused.

Step 2: Open Dataset Detail

Open the dataset and review the summary panel.

Record:

source type;
source ID;
table name when present;
row count;
column count;
status;
steward;
validated time when available;
deprecation time when available.

Step 3: Preview rows

Use the Preview tab to inspect sample rows.

Check:

timestamps are parseable;
asset or equipment identity is stable;
metric names are understandable;
values match expected units;
rows represent the intended source scope;
sensitive fields are suitable for the intended workflow.

If preview is unavailable, confirm whether the dataset has a queryable table and whether the source import or connector run has completed.

Step 4: Profile columns

Use the Profile tab to compute column statistics.

Review:

row count;
column names;
data types;
null ratio;
distinct values;
min and max values when available;
computed time.

Profile results help decide whether the dataset is ready for validation or needs source repair.

Step 5: Assign a steward

Assign a steward before validation.

The steward is the person or team responsible for answering:

where the data came from;
which source system owns it;
how often it should refresh;
which columns are required;
which downstream workflows can use it;
which quality issues need source correction.

Use the steward field only after confirming the user ID format expected by the deployment.

Step 6: Validate the dataset

Validate after preview, profile, and stewardship are complete.

Validation checklist:

dataset name and description are clear;
source owner is known;
required columns are present;
timestamp and identity columns are correct;
quality issues are reviewed;
downstream consumers are known;
steward is assigned;
source refresh cadence is documented.

Validated datasets are better candidates for fusion tasks, BI reports, AI Agent workflows, and predictive maintenance workflows.

Import and reprocess checks

When a dataset is created from a file import, connector snapshot, or reprocessed source slice, compare the run totals before validation:

Check	What to confirm
Accepted rows	The accepted count matches the source owner's expected scope.
Rejected rows	Row-level errors are visible and assigned to the right owner for correction.
Schema match	Required fields, data types, and column names match the dataset contract.
Identity and time fields	Asset, equipment, event, or timestamp fields can support the intended workflow.
Reprocess result	A corrected source slice reduces the expected error count and preserves lineage.

Keep failed import evidence until the replacement dataset has been validated and downstream consumers agree to use it.

Step 7: Review versions

Use the Versions tab when schema or source logic changes.

Review versions before changing:

column names;
column types;
time columns;
metric columns;
table name;
source query;
transformation logic.

When a version change affects downstream consumers, record the reason and coordinate with the owners of dependent workflows.

Step 8: Check lineage

Use the Lineage tab before changing or deprecating a dataset.

Look for:

upstream fusion task or pipeline;
fusion tasks that consume the dataset;
downstream governance pipelines;
output datasets produced from it.

Lineage is the quickest way to understand blast radius before making lifecycle changes.

Step 9: Deprecate a dataset

Deprecate when a dataset is replaced, retired, or no longer suitable for reuse.

Before deprecating:

Check lineage.
Identify downstream owners.
Confirm the replacement dataset when one exists.
Record the reason.
Notify users before changing reports or workflows.

Deprecation marks the dataset status while preserving review history.

Failure handling

Do not force a dataset through validation when evidence is incomplete.

Problem	Action
Preview does not match source intent	Return to connector, import, or fusion output configuration.
Profile shows unexpected nulls or distinct values	Review source extraction, mappings, filters, and source-system changes.
Steward cannot be assigned	Keep the dataset in draft or limited pilot use until ownership is clear.
Validation fails	Record failed checks and the owner responsible for correction.
Deprecation affects reports or workflows	Keep the dataset active until downstream owners confirm migration.

Operating routine

For important datasets, review weekly:

latest preview;
profile changes;
lineage changes;
quality issues;
downstream complaints;
steward assignment;
status.

Next step

Use Governance Studio when datasets and methods need a repeatable processing pipeline.

Before you start​

Step 1: Create or select a dataset​

Step 2: Open Dataset Detail​

Step 3: Preview rows​

Step 4: Profile columns​

Step 5: Assign a steward​

Step 6: Validate the dataset​

Import and reprocess checks​

Step 7: Review versions​

Step 8: Check lineage​

Step 9: Deprecate a dataset​

Failure handling​

Operating routine​

Next step​