Text Classification
Categorizing content
The Problem: You have thousands of texts that need to be sorted into categories. Manual classification is tedious. How can AI help?
The Solution: An Automatic Sorting Hat
Text classification uses an LLM to assign one or more predefined labels to a piece of text. Instead of writing brittle keyword rules, you describe the categories in plain language and let the model read the text and decide. Think of a triage nurse in the ER: every patient is assessed and routed to the right department — fast, consistent, and based on a holistic read of the situation rather than a single symptom. The classic tasks are sentiment analysis (positive / negative / neutral), spam detection, topic labeling, and intent recognition.
How it works
Under the hood the model converts your text into embeddings — numeric vectors that capture meaning — and uses that representation to predict the most likely label. With an instruction-tuned LLM you do not even need training data: a clear prompt listing the categories often works in zero-shot mode. Accuracy usually jumps once you add a few labeled examples directly in the prompt (few-shot), especially for the categories the model keeps confusing. For high-volume or latency-sensitive pipelines, a smaller fine-tuned model or a dedicated classifier can be cheaper and faster than calling a large general model on every request. A practical tip: ask the model to return a structured answer like {"label": "spam", "confidence": 0.92} so you can act on the confidence, not just the label.
When to use it — and the pitfalls
Reach for an LLM classifier when categories are nuanced, change often, or depend on context that simple rules miss. The biggest pitfalls are ambiguous boundaries (a complaint that is also a feature request), sarcasm ("Oh great, another broken update!" reads as positive on the surface), and class imbalance, where a rare category gets ignored. Always set a confidence threshold and route low-confidence items to human review or an "uncertain" bucket. Worked example: to sort support tickets, define the labels (Bug, Billing, Feature request,Other), describe each boundary, add two example tickets per label, then ask the model to output a label plus confidence. A ticket like "I was charged twice this month" returns Billing: 0.97 and is auto-routed; anything below 0.6 goes to a person.
Think of it like a triage nurse in the ER:
- 1. Define categories: List all labels: Spam, Important, Social, Promotions
- 2. Describe boundaries: Clarify what belongs where — "promotional newsletters go to Promotions, not Spam"
- 3. Provide examples (few-shot): Show 2-3 examples per category, especially for ambiguous cases
- 4. AI classifies with confidence: Model assigns a label and a confidence score (e.g., "Spam: 92%")
- 5. Handle ambiguous cases: Low-confidence items go to human review or get multiple labels
Where Is This Used?
- Sentiment Analysis: Positive, negative, or neutral feedback
- Spam Detection: Filtering unwanted messages
- Topic Labeling: Categorizing articles or support tickets
- Intent Recognition: Understanding what users want
- Common Pitfall: Edge Cases: Multi-label texts (a complaint that is also a feature request), sarcasm, and ambiguous categories can confuse classifiers — always define what happens at boundaries
Fun Fact: Classification breaks in fascinating ways: sarcastic reviews ("Oh great, another broken product!") often get classified as positive, multi-label texts stump single-label classifiers, and cultural context shifts meaning entirely. Production systems always need a confidence threshold and an "uncertain" bucket.
Try It Yourself!
Use the interactive example below to classify different texts and see how AI handles various types of content.
“This product exceeded all my expectations! The quality is amazing and delivery was super fast.”
Select the correct category:
Key Insight
- •Classification = mapping text to a category. The model looks at key words and patterns to decide.
- •Confidence matters — low confidence means the text is ambiguous. In production, route these to human review.
- •Sarcasm, multi-topic texts, and mixed intents are the hardest cases — real systems need multi-label classification for these.
Frequently asked questions
How does zero-shot classification differ from fine-tuned models?
Zero-shot uses general LLM knowledge to classify without training examples. Fine-tuned models are trained on labeled data for higher accuracy on specific categories but require time and data to set up.
How many categories can an LLM handle at once?
Modern LLMs can effectively handle 20–50 categories in a single prompt. For larger taxonomies, use hierarchical classification: first broad categories, then subcategories.
When should I use LLM vs traditional ML for classification?
Use LLMs for rapid prototyping, changing categories, or low-data scenarios. Use traditional ML (fine-tuned BERT, logistic regression) when you need consistent high accuracy on stable categories with abundant labeled data.
How do I ensure consistent output format from an LLM classifier?
Use structured output (JSON mode), provide explicit format examples in your prompt, and add validation logic. Some APIs offer function calling to guarantee schema-compliant responses.
Try it yourself
Interactive demo of this technique
Classify a support ticket by category and priority
This ticket belongs to the "account issues" category. The user needs help logging in.
{
"primary_category": "auth",
"secondary_categories": ["billing", "data"],
"priority": "critical",
"priority_reason": "Paid user blocked for 3 days, needs document access for work, password reset not working",
"confidence": 0.95
}
Closed category list + prioritization rules + multi-label format yield precise, actionable classification instead of a vague single label.
Create a free account to solve challenges
5 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path