Semantic Search
Beyond keyword matching
The Problem: Traditional keyword search fails when users use different words than your documents. Searching "headache remedy" won't find "migraine treatment". How do you bridge this gap?
The Solution: Understanding Meaning, Not Words
Semantic Search uses embeddings — dense vector representations of text meaning — to find content by concept rather than keyword. Every text is converted to a point in high-dimensional space, and similar meanings land near each other. Searching for "headache remedy" finds "migraine treatment" because both map to nearby vectors, even though they share no words.
Think of it like a librarian who understands what you mean, not just what you said:
- 1. Convert documents to embeddings: Each document is encoded into a dense vector and stored in a vector database
- 2. Convert query to embedding: The user's search query is encoded using the same embedding model
- 3. Calculate cosine similarity: Measure the angle between the query vector and every document vector
- 4. Rank by similarity score: Documents closest in meaning to the query float to the top of results
- 5. Return top-k results: Deliver the most semantically relevant matches, often combined with reranking
Where Is This Used?
- Knowledge Base Search: Finding relevant support articles even when users describe problems in their own words
- Documentation Search: Surfacing the right API reference page from a conceptual question
- Product Discovery: "Comfy shoes for long walks" finds "ergonomic footwear" and "orthopedic sneakers"
- Cross-Lingual Search: A query in English finds semantically matching documents written in Russian or French
- Common Pitfall: Embedding Blind Spots: Embedding models struggle with rare proper nouns, product codes, and very recent terminology — hybrid search (semantic + keyword BM25) handles these edge cases better
Fun Fact: In 1536-dimensional embedding space, the distance between "king" and "queen" is almost identical to the distance between "man" and "woman". This is how embeddings capture relationships. Modern embedding models handle 100+ languages in the same vector space — a Russian question can find an English answer.
Try It Yourself!
Try the interactive demo below to compare keyword search vs semantic search and see how meaning-based matching finds what keywords miss.
Keyword vs Semantic Search
See how the same query returns different results
Select a search query:
Getting Started with Python Programming
A beginner guide to writing your first Python script.
Building a REST API with Node.js
Set up routes, middleware, and deploy your backend server.
Advanced JavaScript Patterns
Closures, prototypes, and design patterns for JS engineers.
Getting Started with Python Programming
A beginner guide to writing your first Python script.
Advanced JavaScript Patterns
Closures, prototypes, and design patterns for JS engineers.
Building a REST API with Node.js
Set up routes, middleware, and deploy your backend server.
Introduction to Data Analysis
Use pandas and statistics to extract insights from datasets.
Neural Network Architecture Guide
Deep dive into layers, activations, and model design.
Understanding Transformers in AI
Attention mechanism, BERT, GPT, and the NLP revolution.
Machine Learning Fundamentals
Core ML algorithms and how models learn from data.
- • Semantic search understands synonyms: "code" matches "programming" even if those words never appear in the query.
- • Keyword search is brittle: missing ONE word means missing the document.
- • Best systems combine both (hybrid search): keyword for exact matches, semantic for conceptual intent.
Try it yourself
Interactive demo of this technique
Convert a user question into a better search query
laptop running slow reasons
Primary query: laptop slow performance degradation
Alternative 1 (symptoms): laptop freezing OR sluggish app loading response time
Alternative 2 (causes): laptop performance degradation causes OR overheating CPU load disk usage high
A good search query is not just "better words" — it's multiple variants covering different phrasings of the same problem, which directly improves the recall of a semantic search system.
Create a free account to solve challenges
1 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path