MDM Entity Resolution Tasks
Entity resolution tasks are implementation-level DFS fusion tasks that prepare MDM output. They can create or update golden records, confirm deterministic aliases, and send uncertain matches to the steward queue.
Use this page when an implementation team is configuring or validating identity resolution. Most business stewards will use the Master Entities and Steward Queue pages instead.
Operating boundary
The MDM resolver pattern keeps responsibilities separated:
- DFS and backend services own MDM persistence, tenant scope, permissions, and audit behavior.
- The resolver computes candidate entities, aliases, and fuzzy matches from the provided context.
- Fuzzy matches go to the steward queue instead of being silently accepted.
- Source systems remain systems of record.
Describe the resolver as a governed data preparation workflow. Steward decisions remain part of the identity process.
Before you start
Confirm:
- source datasets are available and stewarded;
- target entity type exists;
- reference data needed by the resolver is ready;
- method configuration is reviewed;
- source fields include stable identity signals;
- downstream owners know whether the task is test, pilot, or production.
Task input
Typical inputs include:
| Input | Use |
|---|---|
| Source datasets | Distinct source rows that may describe the same object. |
| Entity type | Target class such as device, asset, part, station, or equipment. |
| Match keys | Deterministic fields used for high-confidence matching. |
| Fuzzy fields | Names, descriptions, aliases, or attributes used to suggest candidates. |
| Survivorship rule | How canonical attributes are chosen when sources disagree. |
| Existing MDM context | Current entities, active aliases, and rejected pairs used to avoid duplicate work. |
Configuration checklist
Before the resolver is run against a full dataset, review the configuration with the data owner:
| Configuration area | Questions to answer |
|---|---|
| Entity type | Does the type represent a real operational object with a stable lifecycle? |
| Source priority | Which source wins for name, location, class, and status when sources disagree? |
| Deterministic keys | Which fields can safely confirm a match without steward review? |
| Fuzzy fields | Which fields are useful for suggestions but need human review? |
| Validity period | How are replacements, retired assets, and reused source IDs handled? |
| Rejected-pair memory | Are previous steward rejections included to suppress repeated false positives? |
| Run mode | Is the run preview-only, pilot, or allowed to write approved MDM output? |
Start with a narrow slice that includes clean records, known duplicates, retired objects, and at least a few hard cases. A slice with only clean records gives a false sense of readiness.
Output
The task should produce:
- entities to create or update;
- aliases to confirm;
- fuzzy candidates for steward review;
- run metrics and errors;
- enough lineage to explain the decision.
When a task returns MDM output, treat that output as the governed identity result. Publish ordinary dataset rows through a separate data workflow when needed.
Validation flow
Review run results
After each run, check:
- number of entities created or updated;
- number of aliases confirmed;
- number of fuzzy candidates created;
- skipped or malformed records;
- task errors;
- whether steward workload is acceptable.
High candidate volume usually means match keys, source quality, or survivorship rules need more review.
Useful metrics:
| Metric | Why it matters |
|---|---|
| Deterministic match rate | Shows how many records can be matched with stable keys. |
| New entity rate | Helps detect accidental entity explosion. |
| Fuzzy candidate rate | Estimates steward workload before production use. |
| Rejected-pair repeat rate | Shows whether previous decisions are being reused. |
| Missing-key count | Points to source mapping or data-quality work. |
| Downstream row movement | Shows how many fused records, work orders, or events changed after identity updates. |
Review several examples from each bucket. A run can look healthy in aggregate while still creating high-impact errors in one asset class or source system.
End-to-end scenario
A typical implementation path connects MDM to the rest of DFS:
- Use DFS Lite to ingest asset records from maintenance, BMS, inspection, or spreadsheet sources.
- Normalize source fields and map required identity signals.
- Run an MDM resolver task for the target entity type.
- Confirm deterministic aliases and send uncertain matches to the Steward Queue.
- Re-run the DFS Pro fusion task that joins work orders, readings, inspections, and events to the master entity ID.
- Hand off the reviewed dataset to Inspector workflows, AI Agent evidence retrieval, BI reporting, or another operational application.
The handoff should include the resolver run ID or task name, source slice, entity type, steward decision counts, open exceptions, and downstream refresh status.
Handoff checklist
- Task name and purpose are clear.
- Target entity type is correct.
- Input datasets and source fields are documented.
- Match keys and fuzzy fields are reviewed.
- Steward queue has an owner.
- Downstream workflows know whether output is approved for use.
- Known limitations and unresolved identity cases are recorded.
- Run metrics are attached to the handoff record.
- Re-run instructions are clear for future source refreshes.
Next step
Use Steward Queue to process fuzzy candidates created by resolver tasks.