Glossary/Semantic Similarity
AI Foundations

Semantic Similarity

A measure of how alike two pieces of text are in meaning, regardless of exact wording.

Definition

Semantic similarity is a metric that quantifies how similar two pieces of text are in meaning, independent of the specific words used. A high semantic similarity score indicates that two texts convey the same or closely related ideas, even if they use completely different vocabulary. This is distinct from syntactic similarity (which measures word overlap) and is fundamental to building AI systems that understand language the way humans do.

Why it matters in 2026

Semantic similarity is a foundational capability for enterprise AI quality assurance. Organizations use it to detect when data definitions have drifted across systems, to validate that AI-generated content matches source material, to deduplicate knowledge bases, and to measure the quality of RAG retrieval. The ability to programmatically measure meaning alignment is essential for reliable AI governance.

How it works

Semantic similarity is typically computed by generating vector embeddings for both texts and then computing the cosine similarity between the resulting vectors. Cosine similarity measures the angle between two vectors in high-dimensional space — a score of 1.0 means identical direction (maximum similarity), 0.0 means perpendicular (no similarity), and -1.0 means opposite directions. Scores above 0.85 typically indicate strong semantic alignment.

Real-world example

A financial services firm uses semantic similarity to ensure consistency across their data catalog. They compare the definition of 'Net Revenue' in their data warehouse (0.94 similarity) versus their CRM (0.71 similarity) and identify that the CRM definition includes refunds while the warehouse definition excludes them — a discrepancy that would cause AI agents to produce inconsistent financial reports.

Related Terms

4 terms
Browse all 46 terms →

Further Reading