Per-Model Prompting Guides
Optimize prompts for Claude, GPT, Gemini & open-source models
The Problem: You write a great prompt that works perfectly on ChatGPT, but when you try it on Claude, the output is worse. When you try it on Llama, it completely ignores your system message. Why?
The Solution: Speak Each Model's Dialect
A per-model prompting guide is the practice of adapting the same task to the conventions each LLM family was trained on, instead of reusing one generic prompt everywhere. The underlying model architecture is similar across providers — a transformer predicting the next token — but the training data, instruction tuning, and recommended formatting differ. Those differences are why a prompt that shines on one model can feel mediocre on another. Think of it as speaking the same language in different dialects: the grammar is shared, but the idioms, punctuation, and politeness norms are not.
How it works
Each provider documents formatting that aligns with how the model was tuned. Anthropic recommends wrapping prompt sections in XML-style tags such as <document> and <instructions>, because Claude was trained on data where that structure carried meaning, and it follows the system prompt closely. OpenAI's GPT models lean on markdown headers, native function calling, and JSON mode for structured output. Google's Gemini is strongly multimodal and benefits from placing images near the start of the prompt, plus search grounding for fresh facts. Open-source models (Llama, Qwen, DeepSeek) expect an exact chat template with special tokens like <|im_start|> / <|im_end|>; skip the template and the model may ignore your system message entirely. The context window also varies widely — from a few thousand tokens on small open models to 200K–1M on frontier ones — so prompt length is part of the dialect too.
When to use it, tradeoffs, and a worked example
Adapt per model whenever output quality, cost, or reliability matters in production — and especially when you support several providers behind one feature. The main tradeoff is maintenance: keeping parallel prompt variants is more work than one shared template, and providers change defaults over time, so guides drift. A common pitfall is the "one prompt fits all" assumption, plus copying XML scaffolding onto GPT (where markdown reads more cleanly) or dropping a chat template on an open-source model. Worked example: say you extract an invoice total from a PDF. On Claude you would put the file inside <document> tags and ask for the answer inside <total> tags. On GPT you would request JSON mode and define a function returning {"total": number}. On Gemini you would attach the image first, then the instruction. Same task, three dialects — and measurably better results than forcing all three through one prompt.
Think of it like speaking different dialects of the same language:
- 1. Learn the model's native format: Claude → XML tags, GPT → markdown/JSON, Gemini → structured templates, Open-source → chat templates with special tokens
- 2. Use model-specific features: Claude: prefilled assistant responses. GPT: function calling, JSON mode. Gemini: search grounding, image-first multimodal. Llama: LoRA adapters.
- 3. Adapt prompt length and structure: Claude and Gemini handle very long contexts well (200K-1M). GPT works best with focused, concise prompts. Open-source models struggle beyond 8-32K tokens.
- 4. Test and compare: Run the same task on multiple models, compare outputs, then optimize the prompt for your chosen model's strengths
Model-Specific Prompt Formats
- Claude (Anthropic): XML tags for structure, extended thinking, 200K context, prefilled responses, strong system prompt adherence
- GPT-4 / GPT-5 (OpenAI): JSON mode, function calling, markdown formatting preferred, instruction prioritization (later instructions win)
- Gemini (Google): True multimodal (images first in prompt), 1M+ token context, search grounding, structured prompt templates boost accuracy 40%
- Open-Source (Llama, DeepSeek, Qwen): Chat templates required (im_start/im_end), explicit formatting, shorter prompts work better, model-specific system prompt formats
- Common Pitfall: One Prompt Fits All: A prompt optimized for GPT-4 may underperform on Claude by 20-30% because Claude expects XML structure, not markdown headers. Always adapt to the model.
Fun Fact: Claude was specifically trained on XML-structured data, which is why wrapping your prompt sections in tags like <instructions>, <context>, <output_format> dramatically improves performance. GPT models, on the other hand, tend to perform better with markdown headers (## Instructions) — using XML on GPT actually hurts readability for the model.
Try It Yourself!
Explore the interactive comparison below to see how the same task is prompted differently for each model and learn their unique features.
Per-Model Prompting Guide
- • Long context (200K tokens)
- • XML-structured prompts
- • Extended thinking
- • Strong system prompt adherence
XML tags: <instructions>, <context>, <examples>, <output_format>
- ✦ Prefilled assistant responses
- ✦ XML tag parsing trained into model
- ✦ Chain-of-thought via extended thinking
- • Function calling & tool use
- • JSON mode for structured output
- • Strong instruction following
- • Later instructions prioritized
Markdown headers: ## Role, ## Instructions, ## Examples, ## Output
- ✦ JSON mode (guaranteed valid JSON)
- ✦ Function/tool calling API
- ✦ Structured Outputs schema
- • Massive context (1M+ tokens)
- • Native multimodal (images, video, audio)
- • Search grounding
- • Structured templates +40% accuracy
Structured templates with clear sections. Images/media at the START of the prompt.
- ✦ Search grounding (live web data)
- ✦ Image-first multimodal processing
- ✦ 1M+ context for entire codebases
- • Full local control, no API costs
- • Fine-tuning with LoRA/QLoRA
- • Custom deployment options
- • No data leaves your servers
Chat templates with special tokens: <|im_start|>system, <|im_start|>user, <|im_start|>assistant
- ✦ LoRA/QLoRA fine-tuning for custom tasks
- ✦ Quantization for edge deployment
- ✦ No rate limits or usage restrictions
Frequently asked questions
Why does the same prompt behave differently on different LLMs?
Because Claude, GPT, Gemini, and open-source models were trained on different data and instruction tuning and expect different formatting. Claude is tuned for XML tags, GPT for markdown and function calling, Gemini for multimodal input, and open-source models for exact chat templates. The transformer architecture is shared, but the conventions differ, so a prompt optimized for one model underperforms on another.
How should I structure a prompt for Claude?
Claude responds well to XML tags: wrap sections in <document>, <instructions>, <context>, and <output_format>. It follows the system prompt closely and supports up to 200K context tokens, prefilled assistant responses, and extended thinking. For extraction tasks, specify the output format directly in tags, e.g. ask for the answer inside <total>. This is more reliable than markdown headers, which work better on GPT.
How is prompting for GPT different from Gemini?
GPT (OpenAI) prefers markdown headers, JSON mode, and native function calling for structured output, and it prioritizes later instructions when they conflict. Gemini (Google) is strongly multimodal: place images at the start of the prompt, it offers a 1M+ token context and search grounding for fresh facts. So GPT with JSON is convenient for structured answers, while Gemini fits image-heavy or up-to-date tasks.
Why do open-source models need a chat template?
Llama, Qwen, and DeepSeek expect an exact chat template with special tokens like <|im_start|> and <|im_end|> that mark the roles (system, user, assistant). Skip the template or use the wrong format and the model may ignore your system message entirely and produce garbage. Use the model's official template (usually via apply_chat_template in the tokenizer), keep prompts shorter, and format explicitly, since open-source models handle long context worse than frontier models.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path