Build an Operational Data Pipeline with DFS

Use this workflow when a team needs to turn source-system data into governed operational data for dashboards, Inspector workflows, predictive maintenance, Physical AI scenarios, or FactVerse AI Agent.

The working path is:

When to use this workflow

Situation	Use DFS Lite	Add DFS Pro
One source feeds a live operational view	Connect, map, sync, and monitor the source.	Add a dataset when the data will be reused, versioned, or reviewed.
Multiple systems describe the same asset or event	Bring each source into a stable connector path.	Use datasets, fusion tasks, review queue, lineage, and audit records.
An AI workflow needs evidence	Confirm source freshness and mappings.	Create an Agent-ready dataset with steward, profile, quality notes, and allowed use.
Physical AI needs operating constraints	Map scene, equipment, signal, and process fields.	Package reusable scenario input data and keep source timestamps visible.

Prerequisites

Downstream workflow owner, source owner, data steward, and reviewer are identified.
Source systems, required fields, target identity, units, freshness window, and accepted use are documented.
The tenant has DFS Lite connector access and DFS Pro dataset access when reusable governed data is required.

Capability boundaries

DFS has product UI surfaces and AI-side services. Treat the UI as the operator workflow, and use the code-backed services as the current execution boundary.

Area	Current code-backed capability	Operator implication
DFS Lite source adapters	REST, CSV, MQTT, and OPC UA connector classes are implemented in the AI-side connector layer.	Confirm environment support before selecting project-specific connector templates such as building, database, or protocol variants.
Connection test	Connector test fetches a small sample or checks the broker/server path.	A passing test confirms reachability; mapping, preview, sync, and quality review still matter.
Field mapping suggestions	Mapping suggestion scores source schema against a target catalog using aliases, token overlap, type compatibility, and confidence.	Suggestions are review inputs. A person still confirms identity, units, transform expression, and target field.
DFS Pro dataset read path	Fusion reads dataset rows by tenant and dataset ID from registered DFS dataset tables, with row metadata carried into the frame.	Dataset IDs, table names, tenant ownership, and payload shape must be correct before fusion or Agent use.
Fusion execution	Registered methods include natural-key merge and real-over-synthetic gap fill.	Choose the method from the data problem, then record config, input dataset IDs, output dataset, run result, conflicts, and review state.
Review and audit	Frontend exposes review items, rejection queue, audit log, dataset versions, lineage, metrics, and BI report surfaces.	Use these pages to keep human decisions, rejected rows, replay status, and downstream use traceable.

Step 1: Define the evidence contract

Start with the downstream workflow before connector details.

Write down:

the business question or operating task;
source systems and source owners;
target asset, equipment, point, scene, or work-record identity;
required fields and accepted units;
expected timestamp field and freshness window;
known value ranges and quality limits;
reviewer and steward;
allowed downstream use, such as dashboard, Agent answer, work-order draft, simulation input, or report.

Examples:

Workflow	Evidence contract
Facility operations	Asset ID, point name, meter value, timestamp, unit, alarm state, work-order reference, source freshness.
Predictive maintenance	Equipment ID, signal history, component or subsystem, maintenance event, operating mode, anomaly label, engineer note.
Physical AI	Scene ID, model asset version, component geometry, process constraint, task sequence, operating signal, validation note.
Data center operations	Facility or room ID, asset ID, meter or status point, inspection record, energy calculation input, maintenance history.

Step 2: Create and test the DFS Lite connector

Open:

Data Integration > Connectors

Create a connector with a name that identifies site, source system, and purpose.

Use the source type that is enabled for the environment. The AI-side connector layer currently has implementation for:

Source	Prepare
REST	HTTPS URL, request parameters, headers, token or API key, response shape.
CSV	File path or uploaded file, delimiter, header row, timestamp column, key fields.
MQTT	Broker URL, topic pattern, QoS, credentials, payload format, sampling expectation.
OPC UA	Server URL, node IDs, username/password or certificate plan, sampling expectation.

Run Test Connection before saving or starting the connector. Save the connector after the test, then keep it paused when mapping or source meaning still needs review.

Step 3: Browse and preview source data

Open the connector detail page and use browse or preview before mapping.

Check:

expected equipment, tags, topics, tables, or files are visible;
sample values exist for the selected node or field;
timestamp fields are usable;
units and scale factors are known;
empty, stale, impossible, or duplicated values are visible before sync;
source owner can confirm the field meaning.

Record the selected source paths. They become mapping inputs and review evidence.

Step 4: Review field mappings

Open the mapping area for the connector.

For each source field, review:

Mapping field	Review question
Source path	Which tag, topic field, JSON path, node ID, or CSV column produced the value?
Source type	Is the source type numeric, string, timestamp, boolean, or enum?
Target entity	Does the value belong to an asset, point, dataset field, work record, or scene context?
Target field	Which canonical field will downstream workflows read?
Transform expression	Is unit conversion, enum normalization, scaling, or timestamp parsing required?
Confidence and rationale	Which tokens or type hints supported an AI mapping suggestion?

When AI mapping suggestions are enabled, use them as a review accelerator. The mapping service uses source schema and target catalog metadata; it avoids connector credentials and raw sample values.

Accept a suggestion after checking identity and units. For high-impact workflows, keep a note that explains why the target field was accepted.

Step 5: Sync and check quality

Start the connector or trigger on-demand sync.

Open:

Data Integration > Sync History
Data Integration > Quality

Review:

sync status and error message;
rows read, rows written, failed rows, and latest timestamp;
connector quality score;
completeness, timeliness, and accuracy indicators when available;
quota usage for connectors and daily points;
latest failed field, failed row, or rejected payload.

Use the data for downstream workflows after the source owner and workflow reviewer understand the quality state.

Step 6: Promote reusable data into DFS Pro

Open:

Data Integration > Datasets

Create a DFS Pro dataset when the data needs reuse, versioning, lineage, fusion, BI, or Agent access.

Dataset description should include:

source connector or source system;
business purpose;
owner and steward;
refresh cadence;
key identity fields;
required timestamp field;
known limitations;
allowed downstream workflows.

After creation, open the dataset detail page and run:

Check	Purpose
Preview	Confirm representative rows and payload shape.
Profile	Check columns, null ratio, distinct IDs, timestamp range, and outliers.
Validate	Mark the dataset ready after schema and identity review.
Versions	Keep change history visible before replacing a dataset.
Lineage	Show source connector, transform, fusion, and output relationships.
Change impact	Review downstream impact before changing a dataset version.

Step 7: Choose a fusion path when multiple sources are involved

Open:

Data Integration > Methods
Data Integration > Data Fusion

Choose the fusion method from the data problem.

Method key	Use when	Required review
`merge_by_natural_key`	Several datasets describe the same asset, event, fault, work order, or record using stable natural keys or aliases.	Confirm natural key fields, source priority, comparable output fields, conflicts, and canonical source selection.
`prefer_real_over_synthetic_with_gap_fill`	A real dataset and a synthetic or calculated dataset share keys, and real values should take priority while the second dataset fills gaps.	Confirm key columns, value columns, tolerance percentage, source provenance, and conflict count.

For fusion task configuration, record:

input dataset IDs;
output dataset name;
method key;
method configuration;
run ID and status;
total, matched, and conflict counts;
reviewer decision;
unresolved conflicts or rejected rows.

Step 8: Resolve review items and rejected rows

Open:

Data Integration > Reviews
Data Integration > Audit
Data Integration > Metrics

Use the review queue for conflicts, uncertain matches, low-confidence records, and manual decisions.

For rejected ingest rows, keep the source row evidence visible:

source;
source file or cursor;
correlation ID;
row index;
raw payload;
reason and field errors;
state such as pending, acknowledged, fixed in source, or reprocessed;
reviewer note.

Use audit log and metrics to show who changed a dataset, method, fusion task, review item, or replay state.

Step 9: Hand off to downstream workflows

Before handoff, package the data evidence in a short run note.

Destination	Include
Dashboard or BI report	Dataset ID, columns, filter assumptions, refresh cadence, and owner.
Inspector or work-order workflow	Asset ID mapping, latest work records, inspection references, allowed draft action, and reviewer.
FactVerse AI Agent	Dataset ID/version, source timestamp, quality note, steward, allowed answer type, and MCP scope plan.
Predictive maintenance	Equipment ID, signal window, feature columns, maintenance history, anomaly labels, and engineer review state.
Physical AI	Scene ID, model asset version, component geometry, process constraints, simulation input dataset, and validation notes.

Acceptance checklist

The connector test passed in the target environment.
Browse and preview confirmed source paths and sample values.
Mappings were reviewed for identity, units, timestamp, and transform expression.
Sync history shows the latest successful or explainable run.
Data quality state is visible to the workflow owner.
DFS Pro dataset has steward, profile, validation state, and lineage.
Fusion tasks record method key, config, input datasets, output dataset, and conflicts.
Review queue decisions are resolved or explicitly carried forward.
Downstream workflows receive dataset ID, source timestamps, quality notes, and reviewer boundary.

Page	Use
DFS Lite Connectors	Configure and monitor source connectors.
Mapping Source Fields	Map source fields to target entities and review suggestions.
DFS Pro Datasets	Create and validate reusable datasets.
DFS Pro Fusion Tasks	Run multi-source merge, match, and gap-fill workflows.
Review Queue	Resolve conflicts and human-review items.
Create an AI Agent-Ready Dataset	Prepare data evidence for Agent workflows.

When to use this workflow​

Prerequisites​

Capability boundaries​

Step 1: Define the evidence contract​

Step 2: Create and test the DFS Lite connector​

Step 3: Browse and preview source data​

Step 4: Review field mappings​

Step 5: Sync and check quality​

Step 6: Promote reusable data into DFS Pro​

Step 7: Choose a fusion path when multiple sources are involved​

Step 8: Resolve review items and rejected rows​

Step 9: Hand off to downstream workflows​

Acceptance checklist​

Related pages​

When to use this workflow

Prerequisites

Capability boundaries

Step 1: Define the evidence contract

Step 2: Create and test the DFS Lite connector

Step 3: Browse and preview source data

Step 4: Review field mappings

Step 5: Sync and check quality

Step 6: Promote reusable data into DFS Pro

Step 7: Choose a fusion path when multiple sources are involved

Step 8: Resolve review items and rejected rows

Step 9: Hand off to downstream workflows

Acceptance checklist

Related pages