Build an Operational Data Pipeline with DFS
Use this workflow when a team needs to turn source-system data into governed operational data for dashboards, Inspector workflows, predictive maintenance, Physical AI scenarios, or FactVerse AI Agent.
The working path is:
Define evidence contract -> create DFS Lite connector -> browse and preview
-> review mappings -> sync -> check quality -> create DFS Pro dataset
-> validate/profile/lineage -> fuse if needed -> hand off with evidence
When to use this workflow
| Situation | Use DFS Lite | Add DFS Pro |
|---|---|---|
| One source feeds a live operational view | Connect, map, sync, and monitor the source. | Add a dataset when the data will be reused, versioned, or reviewed. |
| Multiple systems describe the same asset or event | Bring each source into a stable connector path. | Use datasets, fusion tasks, review queue, lineage, and audit records. |
| An AI workflow needs evidence | Confirm source freshness and mappings. | Create an Agent-ready dataset with steward, profile, quality notes, and allowed use. |
| Physical AI needs operating constraints | Map scene, equipment, signal, and process fields. | Package reusable scenario input data and keep source timestamps visible. |
Capability boundaries
DFS has product UI surfaces and AI-side services. Treat the UI as the operator workflow, and use the code-backed services as the current execution boundary.
| Area | Current code-backed capability | Operator implication |
|---|---|---|
| DFS Lite source adapters | REST, CSV, MQTT, and OPC UA connector classes are implemented in the AI-side connector layer. | Confirm environment support before selecting project-specific connector templates such as building, database, or protocol variants. |
| Connection test | Connector test fetches a small sample or checks the broker/server path. | A passing test confirms reachability; mapping, preview, sync, and quality review still matter. |
| Field mapping suggestions | Mapping suggestion scores source schema against a target catalog using aliases, token overlap, type compatibility, and confidence. | Suggestions are review inputs. A person still confirms identity, units, transform expression, and target field. |
| DFS Pro dataset read path | Fusion reads dataset rows by tenant and dataset ID from registered DFS dataset tables, with row metadata carried into the frame. | Dataset IDs, table names, tenant ownership, and payload shape must be correct before fusion or Agent use. |
| Fusion execution | Registered methods include natural-key merge and real-over-synthetic gap fill. | Choose the method from the data problem, then record config, input dataset IDs, output dataset, run result, conflicts, and review state. |
| Review and audit | Frontend exposes review items, rejection queue, audit log, dataset versions, lineage, metrics, and BI report surfaces. | Use these pages to keep human decisions, rejected rows, replay status, and downstream use traceable. |
Step 1: Define the evidence contract
Start with the downstream workflow before connector details.
Write down:
- the business question or operating task;
- source systems and source owners;
- target asset, equipment, point, scene, or work-record identity;
- required fields and accepted units;
- expected timestamp field and freshness window;
- known value ranges and quality limits;
- reviewer and steward;
- allowed downstream use, such as dashboard, Agent answer, work-order draft, simulation input, or report.
Examples:
| Workflow | Evidence contract |
|---|---|
| Facility operations | Asset ID, point name, meter value, timestamp, unit, alarm state, work-order reference, source freshness. |
| Predictive maintenance | Equipment ID, signal history, component or subsystem, maintenance event, operating mode, anomaly label, engineer note. |
| Physical AI | Scene ID, model asset version, component geometry, process constraint, task sequence, operating signal, validation note. |
| Data center operations | Facility or room ID, asset ID, meter or status point, inspection record, energy calculation input, maintenance history. |
Step 2: Create and test the DFS Lite connector
Open:
Data Integration > Connectors
Create a connector with a name that identifies site, source system, and purpose.
Use the source type that is enabled for the environment. The AI-side connector layer currently has implementation for:
| Source | Prepare |
|---|---|
| REST | HTTPS URL, request parameters, headers, token or API key, response shape. |
| CSV | File path or uploaded file, delimiter, header row, timestamp column, key fields. |
| MQTT | Broker URL, topic pattern, QoS, credentials, payload format, sampling expectation. |
| OPC UA | Server URL, node IDs, username/password or certificate plan, sampling expectation. |
Run Test Connection before saving or starting the connector. Save the connector after the test, then keep it paused when mapping or source meaning still needs review.
Step 3: Browse and preview source data
Open the connector detail page and use browse or preview before mapping.
Check:
- expected equipment, tags, topics, tables, or files are visible;
- sample values exist for the selected node or field;
- timestamp fields are usable;
- units and scale factors are known;
- empty, stale, impossible, or duplicated values are visible before sync;
- source owner can confirm the field meaning.
Record the selected source paths. They become mapping inputs and review evidence.
Step 4: Review field mappings
Open the mapping area for the connector.
For each source field, review:
| Mapping field | Review question |
|---|---|
| Source path | Which tag, topic field, JSON path, node ID, or CSV column produced the value? |
| Source type | Is the source type numeric, string, timestamp, boolean, or enum? |
| Target entity | Does the value belong to an asset, point, dataset field, work record, or scene context? |
| Target field | Which canonical field will downstream workflows read? |
| Transform expression | Is unit conversion, enum normalization, scaling, or timestamp parsing required? |
| Confidence and rationale | Which tokens or type hints supported an AI mapping suggestion? |
When AI mapping suggestions are enabled, use them as a review accelerator. The mapping service uses source schema and target catalog metadata; it avoids connector credentials and raw sample values.
Accept a suggestion after checking identity and units. For high-impact workflows, keep a note that explains why the target field was accepted.
Step 5: Sync and check quality
Start the connector or trigger on-demand sync.
Open:
Data Integration > Sync History
Data Integration > Quality
Review:
- sync status and error message;
- rows read, rows written, failed rows, and latest timestamp;
- connector quality score;
- completeness, timeliness, and accuracy indicators when available;
- quota usage for connectors and daily points;
- latest failed field, failed row, or rejected payload.
Use the data for downstream workflows after the source owner and workflow reviewer understand the quality state.
Step 6: Promote reusable data into DFS Pro
Open:
Data Integration > Datasets
Create a DFS Pro dataset when the data needs reuse, versioning, lineage, fusion, BI, or Agent access.
Dataset description should include:
- source connector or source system;
- business purpose;
- owner and steward;
- refresh cadence;
- key identity fields;
- required timestamp field;
- known limitations;
- allowed downstream workflows.
After creation, open the dataset detail page and run:
| Check | Purpose |
|---|---|
| Preview | Confirm representative rows and payload shape. |
| Profile | Check columns, null ratio, distinct IDs, timestamp range, and outliers. |
| Validate | Mark the dataset ready after schema and identity review. |
| Versions | Keep change history visible before replacing a dataset. |
| Lineage | Show source connector, transform, fusion, and output relationships. |
| Change impact | Review downstream impact before changing a dataset version. |
Step 7: Choose a fusion path when multiple sources are involved
Open:
Data Integration > Methods
Data Integration > Data Fusion
Choose the fusion method from the data problem.
| Method key | Use when | Required review |
|---|---|---|
merge_by_natural_key | Several datasets describe the same asset, event, fault, work order, or record using stable natural keys or aliases. | Confirm natural key fields, source priority, comparable output fields, conflicts, and canonical source selection. |
prefer_real_over_synthetic_with_gap_fill | A real dataset and a synthetic or calculated dataset share keys, and real values should take priority while the second dataset fills gaps. | Confirm key columns, value columns, tolerance percentage, source provenance, and conflict count. |
For fusion task configuration, record:
- input dataset IDs;
- output dataset name;
- method key;
- method configuration;
- run ID and status;
- total, matched, and conflict counts;
- reviewer decision;
- unresolved conflicts or rejected rows.
Step 8: Resolve review items and rejected rows
Open:
Data Integration > Reviews
Data Integration > Audit
Data Integration > Metrics
Use the review queue for conflicts, uncertain matches, low-confidence records, and manual decisions.
For rejected ingest rows, keep the source row evidence visible:
- source;
- source file or cursor;
- correlation ID;
- row index;
- raw payload;
- reason and field errors;
- state such as pending, acknowledged, fixed in source, or reprocessed;
- reviewer note.
Use audit log and metrics to show who changed a dataset, method, fusion task, review item, or replay state.
Step 9: Hand off to downstream workflows
Before handoff, package the data evidence in a short run note.
| Destination | Include |
|---|---|
| Dashboard or BI report | Dataset ID, columns, filter assumptions, refresh cadence, and owner. |
| Inspector or work-order workflow | Asset ID mapping, latest work records, inspection references, allowed draft action, and reviewer. |
| FactVerse AI Agent | Dataset ID/version, source timestamp, quality note, steward, allowed answer type, and MCP scope plan. |
| Predictive maintenance | Equipment ID, signal window, feature columns, maintenance history, anomaly labels, and engineer review state. |
| Physical AI | Scene ID, model asset version, component geometry, process constraints, simulation input dataset, and validation notes. |
Acceptance checklist
- The connector test passed in the target environment.
- Browse and preview confirmed source paths and sample values.
- Mappings were reviewed for identity, units, timestamp, and transform expression.
- Sync history shows the latest successful or explainable run.
- Data quality state is visible to the workflow owner.
- DFS Pro dataset has steward, profile, validation state, and lineage.
- Fusion tasks record method key, config, input datasets, output dataset, and conflicts.
- Review queue decisions are resolved or explicitly carried forward.
- Downstream workflows receive dataset ID, source timestamps, quality notes, and reviewer boundary.
Related pages
| Page | Use |
|---|---|
| DFS Lite Connectors | Configure and monitor source connectors. |
| Mapping Source Fields | Map source fields to target entities and review suggestions. |
| DFS Pro Datasets | Create and validate reusable datasets. |
| DFS Pro Fusion Tasks | Run multi-source merge, match, and gap-fill workflows. |
| Review Queue | Resolve conflicts and human-review items. |
| Create an AI Agent-Ready Dataset | Prepare data evidence for Agent workflows. |