Image Generation Prompting
Master prompts for DALL-E, Midjourney & Stable Diffusion
The Problem: You type "a cat" into an image generator and get a generic, bland clipart-style cat. How do you get the specific image you have in your head — a fluffy orange tabby in a cozy library, painted in warm watercolor tones?
The Solution: A Creative Brief, Not a Wish
Writing an image prompt is like writing a creative brief for a designer. You would not tell an illustrator "draw something cool" — you would describe the subject, style, mood, and composition. Modern generators (Midjourney, DALL-E 3, Stable Diffusion) are text-to-image diffusion models: they start from random noise and remove it step by step, and at every step your text steers which direction the denoising goes. The more specific and structured your prompt, the more constraints the model has, and the closer the result lands to the picture in your head. The key formula is: Subject + Style + Details + Lighting + Composition. Each element narrows an otherwise infinite possibility space into something specific.
What actually moves the needle
Not every word carries equal weight. The subject and the style/medium (photorealistic, oil painting, anime, 3D render) do the heavy lifting — they reshape the whole image. Lighting is the next most impactful lever: "golden hour" or "dramatic side light" can turn a flat scene into a striking one. Two technical controls matter too. Negative prompts (a dedicated field in Stable Diffusion, the --no flag in Midjourney) tell the model what to avoid — "blurry, low quality, extra fingers, watermark" removes the most common artifacts. A seed fixes the starting noise, so reusing the same seed gives a reproducible image you can then tweak one word at a time instead of re-rolling from scratch.
Tradeoffs, pitfalls, and a worked example
The main pitfall is vagueness: "a cool dragon" gives you a different random dragon every run. The opposite pitfall is overloading — pile on twenty competing adjectives and the model averages them into mush or quietly drops half. Aim for a few decisive, non-contradictory descriptors. A concrete example: instead of "a cat," write "a fluffy orange tabby cat curled up on a stack of old books in a cozy library, warm afternoon light through a tall window, soft watercolor style, shallow depth of field" with a negative prompt of "blurry, deformed, text, watermark." Same model, but now subject, setting, lighting, medium, and composition are all pinned down — so the output is consistent and on-target instead of generic clipart.
Think of it like a creative brief for a designer:
- 1. Describe the subject clearly: Be specific: not "a dog" but "a golden retriever puppy sitting in a field of sunflowers, looking up"
- 2. Choose style and medium: Photorealistic, oil painting, anime, 3D render, pixel art — each produces dramatically different results from the same subject
- 3. Add lighting and mood: Golden hour, dramatic shadows, soft diffused light, neon glow — lighting is the single most impactful element after subject
- 4. Specify composition: Camera angle (close-up, wide shot), depth of field, rule of thirds — composition guides the viewer's eye
- 5. Use negative prompts wisely: In Stable Diffusion and Midjourney, specify what to avoid: "blurry, low quality, extra fingers, watermark" — removes common artifacts
Anatomy of a Great Image Prompt
- Subject & Details: Start with a clear subject, then add attributes: age, pose, expression, clothing, surroundings
- Style & Medium: Oil painting, watercolor, 3D render, anime, photorealistic, pixel art — the style defines the entire mood
- Lighting & Atmosphere: Golden hour, dramatic side light, neon glow, soft diffused — lighting transforms ordinary scenes into striking images
- Composition & Camera: Close-up, wide angle, bird-eye view, rule of thirds, bokeh, depth of field — guide how the viewer sees the image
- Common Pitfall: Vague Prompts: "A cool dragon" gives random results. "A jade-green dragon perched on a volcanic cliff at sunset, cinematic lighting, by Greg Rutkowski" gives consistent, striking output
Fun Fact: DALL-E 3 automatically rewrites your prompt behind the scenes — when you type "cool dragon," it internally expands it to something like "majestic dragon with iridescent scales, perched on a mountain peak at sunset, fantasy art style, highly detailed." You can ask ChatGPT to show the expanded prompt to learn from it.
Try It Yourself!
Use the interactive prompt builder below to assemble a professional image prompt step by step and see how each element changes the result.
Image Generation Prompting
a fluffy orange cat, photorealistic, 8K, golden hour sunlight, close-up portrait
Frequently asked questions
How do I write a good image generation prompt?
Use the formula Subject + Style + Details + Lighting + Composition. Start with a clear main subject, then add a style or medium (photorealistic, oil painting, anime, 3D render), lighting (golden hour, soft light), camera angle, and details. The more specific you are, the closer the result to your vision. A vague 'a cool dragon' gives random output, while a detailed description gives consistent, predictable results.
What is a negative prompt and why use it?
A negative prompt lists what the model should NOT include. Common values are 'blurry, low quality, extra fingers, watermark, deformed', which remove the most frequent artifacts like blur, mangled hands, and watermarks. Stable Diffusion has a dedicated negative prompt field, while Midjourney uses the --no parameter (for example, --no text).
What is the difference between Midjourney, DALL-E 3, and Stable Diffusion?
All three are text-to-image diffusion models with different personalities. Midjourney gives the most artistic, polished results out of the box and runs through Discord. DALL-E 3 is built into ChatGPT, understands natural language well, and rewrites your prompt automatically. Stable Diffusion is free and runs locally, offering maximum control (seed, negative prompt, weights) but requiring more detailed setup.
Why does the generator draw something different from what I asked?
Usually the cause is a vague or overloaded prompt. A too-general description ('a cat') leaves the model free to improvise, while too many contradictory adjectives get averaged into mush or partly ignored. The fix: a few decisive, non-contradictory details, an explicit style and lighting, a negative prompt for unwanted elements, and a fixed seed so you can refine the result one word at a time.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path