Skip to main content

Build an Operational Data Pipeline with DFS

Use this workflow when a team needs to turn source-system data into governed operational data for dashboards, Inspector workflows, predictive maintenance, Physical AI scenarios, or FactVerse AI Agent.

The working path is:

Define evidence contract -> create DFS Lite connector -> browse and preview
-> review mappings -> sync -> check quality -> create DFS Pro dataset
-> validate/profile/lineage -> fuse if needed -> hand off with evidence

When to use this workflow

SituationUse DFS LiteAdd DFS Pro
One source feeds a live operational viewConnect, map, sync, and monitor the source.Add a dataset when the data will be reused, versioned, or reviewed.
Multiple systems describe the same asset or eventBring each source into a stable connector path.Use datasets, fusion tasks, review queue, lineage, and audit records.
An AI workflow needs evidenceConfirm source freshness and mappings.Create an Agent-ready dataset with steward, profile, quality notes, and allowed use.
Physical AI needs operating constraintsMap scene, equipment, signal, and process fields.Package reusable scenario input data and keep source timestamps visible.

Capability boundaries

DFS has product UI surfaces and AI-side services. Treat the UI as the operator workflow, and use the code-backed services as the current execution boundary.

AreaCurrent code-backed capabilityOperator implication
DFS Lite source adaptersREST, CSV, MQTT, and OPC UA connector classes are implemented in the AI-side connector layer.Confirm environment support before selecting project-specific connector templates such as building, database, or protocol variants.
Connection testConnector test fetches a small sample or checks the broker/server path.A passing test confirms reachability; mapping, preview, sync, and quality review still matter.
Field mapping suggestionsMapping suggestion scores source schema against a target catalog using aliases, token overlap, type compatibility, and confidence.Suggestions are review inputs. A person still confirms identity, units, transform expression, and target field.
DFS Pro dataset read pathFusion reads dataset rows by tenant and dataset ID from registered DFS dataset tables, with row metadata carried into the frame.Dataset IDs, table names, tenant ownership, and payload shape must be correct before fusion or Agent use.
Fusion executionRegistered methods include natural-key merge and real-over-synthetic gap fill.Choose the method from the data problem, then record config, input dataset IDs, output dataset, run result, conflicts, and review state.
Review and auditFrontend exposes review items, rejection queue, audit log, dataset versions, lineage, metrics, and BI report surfaces.Use these pages to keep human decisions, rejected rows, replay status, and downstream use traceable.

Step 1: Define the evidence contract

Start with the downstream workflow before connector details.

Write down:

  • the business question or operating task;
  • source systems and source owners;
  • target asset, equipment, point, scene, or work-record identity;
  • required fields and accepted units;
  • expected timestamp field and freshness window;
  • known value ranges and quality limits;
  • reviewer and steward;
  • allowed downstream use, such as dashboard, Agent answer, work-order draft, simulation input, or report.

Examples:

WorkflowEvidence contract
Facility operationsAsset ID, point name, meter value, timestamp, unit, alarm state, work-order reference, source freshness.
Predictive maintenanceEquipment ID, signal history, component or subsystem, maintenance event, operating mode, anomaly label, engineer note.
Physical AIScene ID, model asset version, component geometry, process constraint, task sequence, operating signal, validation note.
Data center operationsFacility or room ID, asset ID, meter or status point, inspection record, energy calculation input, maintenance history.

Step 2: Create and test the DFS Lite connector

Open:

Data Integration > Connectors

Create a connector with a name that identifies site, source system, and purpose.

Use the source type that is enabled for the environment. The AI-side connector layer currently has implementation for:

SourcePrepare
RESTHTTPS URL, request parameters, headers, token or API key, response shape.
CSVFile path or uploaded file, delimiter, header row, timestamp column, key fields.
MQTTBroker URL, topic pattern, QoS, credentials, payload format, sampling expectation.
OPC UAServer URL, node IDs, username/password or certificate plan, sampling expectation.

Run Test Connection before saving or starting the connector. Save the connector after the test, then keep it paused when mapping or source meaning still needs review.

Step 3: Browse and preview source data

Open the connector detail page and use browse or preview before mapping.

Check:

  • expected equipment, tags, topics, tables, or files are visible;
  • sample values exist for the selected node or field;
  • timestamp fields are usable;
  • units and scale factors are known;
  • empty, stale, impossible, or duplicated values are visible before sync;
  • source owner can confirm the field meaning.

Record the selected source paths. They become mapping inputs and review evidence.

Step 4: Review field mappings

Open the mapping area for the connector.

For each source field, review:

Mapping fieldReview question
Source pathWhich tag, topic field, JSON path, node ID, or CSV column produced the value?
Source typeIs the source type numeric, string, timestamp, boolean, or enum?
Target entityDoes the value belong to an asset, point, dataset field, work record, or scene context?
Target fieldWhich canonical field will downstream workflows read?
Transform expressionIs unit conversion, enum normalization, scaling, or timestamp parsing required?
Confidence and rationaleWhich tokens or type hints supported an AI mapping suggestion?

When AI mapping suggestions are enabled, use them as a review accelerator. The mapping service uses source schema and target catalog metadata; it avoids connector credentials and raw sample values.

Accept a suggestion after checking identity and units. For high-impact workflows, keep a note that explains why the target field was accepted.

Step 5: Sync and check quality

Start the connector or trigger on-demand sync.

Open:

Data Integration > Sync History
Data Integration > Quality

Review:

  • sync status and error message;
  • rows read, rows written, failed rows, and latest timestamp;
  • connector quality score;
  • completeness, timeliness, and accuracy indicators when available;
  • quota usage for connectors and daily points;
  • latest failed field, failed row, or rejected payload.

Use the data for downstream workflows after the source owner and workflow reviewer understand the quality state.

Step 6: Promote reusable data into DFS Pro

Open:

Data Integration > Datasets

Create a DFS Pro dataset when the data needs reuse, versioning, lineage, fusion, BI, or Agent access.

Dataset description should include:

  • source connector or source system;
  • business purpose;
  • owner and steward;
  • refresh cadence;
  • key identity fields;
  • required timestamp field;
  • known limitations;
  • allowed downstream workflows.

After creation, open the dataset detail page and run:

CheckPurpose
PreviewConfirm representative rows and payload shape.
ProfileCheck columns, null ratio, distinct IDs, timestamp range, and outliers.
ValidateMark the dataset ready after schema and identity review.
VersionsKeep change history visible before replacing a dataset.
LineageShow source connector, transform, fusion, and output relationships.
Change impactReview downstream impact before changing a dataset version.

Step 7: Choose a fusion path when multiple sources are involved

Open:

Data Integration > Methods
Data Integration > Data Fusion

Choose the fusion method from the data problem.

Method keyUse whenRequired review
merge_by_natural_keySeveral datasets describe the same asset, event, fault, work order, or record using stable natural keys or aliases.Confirm natural key fields, source priority, comparable output fields, conflicts, and canonical source selection.
prefer_real_over_synthetic_with_gap_fillA real dataset and a synthetic or calculated dataset share keys, and real values should take priority while the second dataset fills gaps.Confirm key columns, value columns, tolerance percentage, source provenance, and conflict count.

For fusion task configuration, record:

  • input dataset IDs;
  • output dataset name;
  • method key;
  • method configuration;
  • run ID and status;
  • total, matched, and conflict counts;
  • reviewer decision;
  • unresolved conflicts or rejected rows.

Step 8: Resolve review items and rejected rows

Open:

Data Integration > Reviews
Data Integration > Audit
Data Integration > Metrics

Use the review queue for conflicts, uncertain matches, low-confidence records, and manual decisions.

For rejected ingest rows, keep the source row evidence visible:

  • source;
  • source file or cursor;
  • correlation ID;
  • row index;
  • raw payload;
  • reason and field errors;
  • state such as pending, acknowledged, fixed in source, or reprocessed;
  • reviewer note.

Use audit log and metrics to show who changed a dataset, method, fusion task, review item, or replay state.

Step 9: Hand off to downstream workflows

Before handoff, package the data evidence in a short run note.

DestinationInclude
Dashboard or BI reportDataset ID, columns, filter assumptions, refresh cadence, and owner.
Inspector or work-order workflowAsset ID mapping, latest work records, inspection references, allowed draft action, and reviewer.
FactVerse AI AgentDataset ID/version, source timestamp, quality note, steward, allowed answer type, and MCP scope plan.
Predictive maintenanceEquipment ID, signal window, feature columns, maintenance history, anomaly labels, and engineer review state.
Physical AIScene ID, model asset version, component geometry, process constraints, simulation input dataset, and validation notes.

Acceptance checklist

  • The connector test passed in the target environment.
  • Browse and preview confirmed source paths and sample values.
  • Mappings were reviewed for identity, units, timestamp, and transform expression.
  • Sync history shows the latest successful or explainable run.
  • Data quality state is visible to the workflow owner.
  • DFS Pro dataset has steward, profile, validation state, and lineage.
  • Fusion tasks record method key, config, input datasets, output dataset, and conflicts.
  • Review queue decisions are resolved or explicitly carried forward.
  • Downstream workflows receive dataset ID, source timestamps, quality notes, and reviewer boundary.
PageUse
DFS Lite ConnectorsConfigure and monitor source connectors.
Mapping Source FieldsMap source fields to target entities and review suggestions.
DFS Pro DatasetsCreate and validate reusable datasets.
DFS Pro Fusion TasksRun multi-source merge, match, and gap-fill workflows.
Review QueueResolve conflicts and human-review items.
Create an AI Agent-Ready DatasetPrepare data evidence for Agent workflows.