Glossary/Entity Resolution
Data Engineering

Entity Resolution

The process of identifying and linking records that refer to the same real-world entity across datasets.

Definition

Entity resolution (also known as record linkage, deduplication, or entity matching) is the process of identifying records across one or more datasets that refer to the same real-world entity, and linking or merging them into a single canonical representation. For example, 'Apple Inc.,' 'Apple Computer,' and 'AAPL' in different databases all refer to the same company — entity resolution connects these references.

Why it matters in 2026

Entity resolution is a prerequisite for reliable enterprise AI. When AI agents query data about customers, products, or organizations, they must be able to recognize that the same entity appears under different names or identifiers across systems. Without entity resolution, AI agents double-count, miss connections, and produce inaccurate analyses. The problem has intensified as organizations merge data from acquisitions, integrate third-party data, and deploy AI across fragmented data estates.

How it works

Modern entity resolution combines rule-based matching (exact matches on identifiers like tax ID or email) with probabilistic matching (using ML models to score the likelihood that two records refer to the same entity based on name similarity, address proximity, and other features). Graph-based approaches use knowledge graphs to propagate entity identity through relationship networks.

Real-world example

A global bank has customer records in 12 different systems from various acquisitions. Entity resolution identifies that 'J. Smith, 123 Main St, New York' in the retail banking system and 'John A. Smith, 123 Main Street, NY 10001' in the wealth management system are the same person, enabling a unified customer view for AI-powered personalization.

Related Terms

4 terms
Browse all 46 terms →

Further Reading