Skip to main content

MDM Entity Resolution Tasks

Entity resolution tasks are implementation-level DFS fusion tasks that prepare MDM output. They can create or update golden records, confirm deterministic aliases, and send uncertain matches to the steward queue.

Use this page when an implementation team is configuring or validating identity resolution. Most business stewards will use the Master Entities and Steward Queue pages instead.

Operating boundary

The MDM resolver pattern keeps responsibilities separated:

  • DFS and backend services own MDM persistence, tenant scope, permissions, and audit behavior.
  • The resolver computes candidate entities, aliases, and fuzzy matches from the provided context.
  • Fuzzy matches go to the steward queue instead of being silently accepted.
  • Source systems remain systems of record.

Describe the resolver as a governed data preparation workflow. Steward decisions remain part of the identity process.

Before you start

Confirm:

  • source datasets are available and stewarded;
  • target entity type exists;
  • reference data needed by the resolver is ready;
  • method configuration is reviewed;
  • source fields include stable identity signals;
  • downstream owners know whether the task is test, pilot, or production.

Task input

Typical inputs include:

InputUse
Source datasetsDistinct source rows that may describe the same object.
Entity typeTarget class such as device, asset, part, station, or equipment.
Match keysDeterministic fields used for high-confidence matching.
Fuzzy fieldsNames, descriptions, aliases, or attributes used to suggest candidates.
Survivorship ruleHow canonical attributes are chosen when sources disagree.
Existing MDM contextCurrent entities, active aliases, and rejected pairs used to avoid duplicate work.

Configuration checklist

Before the resolver is run against a full dataset, review the configuration with the data owner:

Configuration areaQuestions to answer
Entity typeDoes the type represent a real operational object with a stable lifecycle?
Source priorityWhich source wins for name, location, class, and status when sources disagree?
Deterministic keysWhich fields can safely confirm a match without steward review?
Fuzzy fieldsWhich fields are useful for suggestions but need human review?
Validity periodHow are replacements, retired assets, and reused source IDs handled?
Rejected-pair memoryAre previous steward rejections included to suppress repeated false positives?
Run modeIs the run preview-only, pilot, or allowed to write approved MDM output?

Start with a narrow slice that includes clean records, known duplicates, retired objects, and at least a few hard cases. A slice with only clean records gives a false sense of readiness.

Output

The task should produce:

  • entities to create or update;
  • aliases to confirm;
  • fuzzy candidates for steward review;
  • run metrics and errors;
  • enough lineage to explain the decision.

When a task returns MDM output, treat that output as the governed identity result. Publish ordinary dataset rows through a separate data workflow when needed.

Validation flow

Review run results

After each run, check:

  • number of entities created or updated;
  • number of aliases confirmed;
  • number of fuzzy candidates created;
  • skipped or malformed records;
  • task errors;
  • whether steward workload is acceptable.

High candidate volume usually means match keys, source quality, or survivorship rules need more review.

Useful metrics:

MetricWhy it matters
Deterministic match rateShows how many records can be matched with stable keys.
New entity rateHelps detect accidental entity explosion.
Fuzzy candidate rateEstimates steward workload before production use.
Rejected-pair repeat rateShows whether previous decisions are being reused.
Missing-key countPoints to source mapping or data-quality work.
Downstream row movementShows how many fused records, work orders, or events changed after identity updates.

Review several examples from each bucket. A run can look healthy in aggregate while still creating high-impact errors in one asset class or source system.

End-to-end scenario

A typical implementation path connects MDM to the rest of DFS:

  1. Use DFS Lite to ingest asset records from maintenance, BMS, inspection, or spreadsheet sources.
  2. Normalize source fields and map required identity signals.
  3. Run an MDM resolver task for the target entity type.
  4. Confirm deterministic aliases and send uncertain matches to the Steward Queue.
  5. Re-run the DFS Pro fusion task that joins work orders, readings, inspections, and events to the master entity ID.
  6. Hand off the reviewed dataset to Inspector workflows, AI Agent evidence retrieval, BI reporting, or another operational application.

The handoff should include the resolver run ID or task name, source slice, entity type, steward decision counts, open exceptions, and downstream refresh status.

Handoff checklist

  • Task name and purpose are clear.
  • Target entity type is correct.
  • Input datasets and source fields are documented.
  • Match keys and fuzzy fields are reviewed.
  • Steward queue has an owner.
  • Downstream workflows know whether output is approved for use.
  • Known limitations and unresolved identity cases are recorded.
  • Run metrics are attached to the handoff record.
  • Re-run instructions are clear for future source refreshes.

Next step

Use Steward Queue to process fuzzy candidates created by resolver tasks.