MDM Entity Resolution Tasks

Entity resolution tasks are implementation-level DFS fusion tasks that prepare MDM output. They can create or update golden records, confirm deterministic aliases, and send uncertain matches to the steward queue.

Use this page when an implementation team is configuring or validating identity resolution. Most business stewards will use the Master Entities and Steward Queue pages instead.

Operating boundary

The MDM resolver pattern keeps responsibilities separated:

DFS and backend services own MDM persistence, tenant scope, permissions, and audit behavior.
The resolver computes candidate entities, aliases, and fuzzy matches from the provided context.
Fuzzy matches go to the steward queue instead of being silently accepted.
Source systems remain systems of record.

Describe the resolver as a governed data preparation workflow. Steward decisions remain part of the identity process.

Before you start

Confirm:

source datasets are available and stewarded;
target entity type exists;
reference data needed by the resolver is ready;
method configuration is reviewed;
source fields include stable identity signals;
downstream owners know whether the task is test, pilot, or production.

Task input

Typical inputs include:

Input	Use
Source datasets	Distinct source rows that may describe the same object.
Entity type	Target class such as device, asset, part, station, or equipment.
Match keys	Deterministic fields used for high-confidence matching.
Fuzzy fields	Names, descriptions, aliases, or attributes used to suggest candidates.
Survivorship rule	How canonical attributes are chosen when sources disagree.
Existing MDM context	Current entities, active aliases, and rejected pairs used to avoid duplicate work.

The platform should assemble the MDM context and pass it to the resolver. For production-scale runs, the resolver can hand computed entities, aliases, and fuzzy candidates back through a staged handoff and return counts to the run record. The backend then performs the final MDM write through the governed MDM services.

This keeps tenant scope, permissions, audit behavior, survivorship, temporal alias handling, and steward queue creation in the platform layer while the resolver focuses on matching.

If a domain integration uses normalized aliases, include those values in the context with the source alias. Treat them as domain-supplied lookup keys: they can help find candidates, but ambiguous matches should still go to the Steward Queue.

Configuration checklist

Before the resolver is run against a full dataset, review the configuration with the data owner:

Configuration area	Questions to answer
Entity type	Does the type represent a real operational object with a stable lifecycle?
Source priority	Which source wins for name, location, class, and status when sources disagree?
Deterministic keys	Which fields can safely confirm a match without steward review?
Fuzzy fields	Which fields are useful for suggestions but need human review?
Validity period	How are replacements, retired assets, and reused source IDs handled?
Rejected-pair memory	Are previous steward rejections included to suppress repeated false positives?
Normalized aliases	Which integration creates the normalized key, and what ambiguity checks are required?
Run mode	Is the run preview-only, pilot, or allowed to write approved MDM output?

Start with a narrow slice that includes clean records, known duplicates, retired objects, and at least a few hard cases. A slice with only clean records gives a false sense of readiness.

Output

The task should produce:

entities to create or update;
aliases to confirm;
fuzzy candidates for steward review;
run metrics and errors;
enough lineage to explain the decision.

For large runs, integrations should use run metrics, staged counts, persisted counts, and steward queue counts as the execution summary. Review Master Entities, Cross-Source Aliases, and the Steward Queue for the governed result.

Publish ordinary dataset rows through a separate data workflow when needed. MDM output is the identity layer; a fused dataset, report, Inspector workflow, or AI Agent workflow should consume the reviewed identity result rather than treating raw resolver output as a reusable dataset.

Implementation boundary

Use this boundary when planning an implementation:

Owner	Responsibility
DFS Lite and DFS Pro	Provide source rows, source contracts, profiles, and dataset lineage.
Resolver	Compare records, propose entities and aliases, and score fuzzy candidates.
Backend MDM services	Persist entities, aliases, fuzzy candidates, audit records, and temporal alias changes.
Steward workflow	Approve or reject ambiguous candidates and record negative decisions.
Downstream workflows	Consume reviewed entity IDs, aliases, run metrics, and handoff notes.

Reruns should be idempotent. A retry should update the intended records or queue entries without creating duplicate entities, duplicate aliases, or repeated false candidates.

Validation flow

Review run results

After each run, check:

number of entities created or updated;
number of aliases confirmed;
number of fuzzy candidates created;
skipped or malformed records;
task errors;
whether steward workload is acceptable.

High candidate volume usually means match keys, source quality, or survivorship rules need more review.

Useful metrics:

Metric	Why it matters
Deterministic match rate	Shows how many records can be matched with stable keys.
New entity rate	Helps detect accidental entity explosion.
Fuzzy candidate rate	Estimates steward workload before production use.
Rejected-pair repeat rate	Shows whether previous decisions are being reused.
Missing-key count	Points to source mapping or data-quality work.
Downstream row movement	Shows how many fused records, work orders, or events changed after identity updates.

Review several examples from each bucket. A run can look healthy in aggregate while still creating high-impact errors in one asset class or source system.

End-to-end scenario

A typical implementation path connects MDM to the rest of DFS:

Use DFS Lite to ingest asset records from maintenance, BMS, inspection, or spreadsheet sources.
Normalize source fields and map required identity signals.
Run an MDM resolver task for the target entity type.
Confirm deterministic aliases and send uncertain matches to the Steward Queue.
Re-run the DFS Pro fusion task that joins work orders, readings, inspections, and events to the master entity ID.
Hand off the reviewed dataset to Inspector workflows, AI Agent evidence retrieval, BI reporting, or another operational application.

The handoff should include the resolver run ID or task name, source slice, entity type, steward decision counts, open exceptions, and downstream refresh status.

Handoff checklist

Task name and purpose are clear.
Target entity type is correct.
Input datasets and source fields are documented.
Match keys and fuzzy fields are reviewed.
Steward queue has an owner.
Downstream workflows know whether output is approved for use.
Known limitations and unresolved identity cases are recorded.
Run metrics are attached to the handoff record.
Re-run instructions are clear for future source refreshes.

Next step

Use Steward Queue to process fuzzy candidates created by resolver tasks.

Operating boundary​

Before you start​

Task input​

Configuration checklist​

Output​

Implementation boundary​

Validation flow​

Review run results​

End-to-end scenario​

Handoff checklist​

Next step​