Vision Hallucinations
When models lie with confidence
The Problem: Vision models hallucinate with perfect confidence — they see objects that don't exist, swap attributes, confuse spatial relationships, complete obscured text, and default to expected quantities. Without knowing these failure modes, you cannot build reliable applications.
The Solution: Five Types of Vision Hallucinations
Vision models don't just make mistakes — they confidently fabricate details that seem perfectly plausible. A model might "see" a cat that's actually a cushion pattern, swap left and right, invent text on a partially obscured sign, or default to expected quantities instead of actually counting. These are called hallucinations — and they are especially dangerous because the model shows no uncertainty. Understanding the five types of vision hallucinations (object, attribute, spatial, OCR, counting) is essential for building reliable multimodal applications.
Think of it like a confident witness giving wrong testimony in court:
- 1. Object hallucination: Model "sees" objects that don't exist — a cat from a cushion pattern, a person from a shadow
- 2. Attribute hallucination: Wrong color, size, or count — swaps attributes between adjacent objects
- 3. Spatial hallucination: Left/right and front/back confusion — the most common spatial error in vision models
- 4. OCR hallucination: Completes obscured or partial text with plausible but incorrect content
- 5. Counting hallucination: Defaults to expected quantities (12 eggs in a carton) instead of actually counting
Where This Matters Most
- Quality Assurance: Detect hallucinated defects in manufacturing — model may "see" cracks that are just shadows
- Medical Imaging: Prevent false positives: model might hallucinate tumors from image artifacts or noise
- Autonomous Driving: Critical safety: model must not hallucinate pedestrians or miss real obstacles
- Legal Document Review: Prevent fabricated clauses or amounts — hallucinated text in contracts has legal consequences
Fun Fact: Research from OpenAI (2025) shows that models are trained to "bluff" rather than express uncertainty. When a model says "I see a red car on the left" with 100% confidence, its internal confidence might actually be only 60%. This is why explicit verification prompts are so important.
Try It Yourself!
Explore the gallery below: see 5 types of vision hallucinations, how they look in practice, and specific prompt strategies to catch and prevent each one.
Try it yourself
Interactive demo of this technique
Detect a counting hallucination in a vision model response
There are 12 eggs in the carton.
Grid breakdown 2×6:
- [1,1]: EGG [1,2]: EGG [1,3]: EMPTY [1,4]: EGG [1,5]: EGG [1,6]: EGG
- [2,1]: EGG [2,2]: EMPTY [2,3]: EGG [2,4]: EMPTY [2,5]: EGG [2,6]: EGG
Eggs: 9. Empty slots: 3. Total: 9 eggs out of 12 slots.
Models default to "typical" quantities (12 eggs in a carton). Chain-of-Thought with grid forces counting each element individually.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path