Embeddings
Meaning as numbers
The Problem: Computers only understand numbers. They don't know that "cat" and "kitten" are related, or that "cat" and "table" are completely different things.
The Solution: Give Each Word an "Address"
After tokenization splits text into tokens, each token needs a numeric representation. Imagine you could give each word coordinates, like GPS. Then similar words would "live" near each other.
The Magic: Math with Words
The amazing thing is — you can do math with these "addresses"!
King − Man + Woman = Queen
The computer "understood" that a queen is the female version of a king!
These embeddings are the foundation that attention and the Transformer architecture build upon.
Think of it like a city map:
- 1. "Cat", "dog", "hamster": live in the Animals district
- 2. "Apple", "banana", "orange": in the Fruits district
- 3. "King", "queen", "prince": in the Royalty district
- 4. "Car", "bus", "train": in the Transport district
The closer words are in meaning — the closer their "addresses" on the map.
Where Is This Used?
- Google Search: finds "laptop" even if you typed "portable computer"
- Translators: understand that "big" and "large" both mean the same thing
- Recommendations: Netflix knows if you like "dramas", you might like "melodramas" too
- ChatGPT: understands the meaning of your question, not just keywords
Fun Fact: These "word addresses" (called embeddings) are used everywhere in modern AI!
Try It Yourself!
Below is an interactive word map. Rotate it, click on words, and see which words ended up as neighbors!
Deep Dive: Measuring Similarity
Cosine Similarity — Arrow Analogy
Imagine two embeddings as two arrows. Cosine similarity measures the angle between them. If arrows point in the same direction (small angle), words are similar: "cat" and "kitten" are nearly parallel. If arrows are perpendicular, words are unrelated. What matters is the direction, not the length — a long and short arrow pointing the same way are still considered similar.
Why High Dimensionality?
Imagine describing a person. With 3 features (height, weight, hair color), many people look alike. With 768 features, you capture subtle differences. Embeddings work the same way: more dimensions mean more nuance in meaning. But there's a trade-off: each additional dimension requires more memory and computation.
Try it yourself
Interactive demo of this technique
Find an answer in the knowledge base to a customer question about product returns
Unfortunately, I could not find information for your query. Try rephrasing your question or contact a support agent.
You can get a full refund within 14 days of purchase. To initiate a return, go to "My Orders" → "Request Refund." Funds will be returned to your card within 3-5 business days.
Embeddings find semantically similar texts even when keywords differ. "Money back" and "refund" are different words but close vectors.
Create a free account to solve challenges
4 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path