ApplicationNew

Code Generation

Writing code with AI

The Problem: Writing code is time-consuming, and it's easy to forget syntax or make mistakes. How can AI help write code faster and more reliably?

The Solution: Your AI Junior Developer

Code Generation uses LLMs to write, complete, and transform code based on natural language descriptions. The model was trained on billions of lines of public source code, so it has learned the statistical patterns of how real programs are structured: common API signatures, idiomatic loops, error-handling conventions, and the way tests usually look. When you describe what you want, the model predicts the most probable sequence of tokens that satisfies that description — it is not reasoning about your program the way a compiler does, it is pattern-matching against everything it has seen. That distinction explains both why it is so fast and why it can be confidently wrong.

How it works in practice

The quality of the output depends almost entirely on how much context you give. A bare one-line request leaves the model guessing about the language, the libraries, the naming style, and the edge cases. Two cheap techniques fix most of this. With chain-of-thought prompting you ask the model to outline its approach before writing code, which surfaces flawed assumptions early. With few-shot examples you paste one or two existing functions from your codebase so the output matches your conventions instead of generic StackOverflow style. Modern coding agents go further: they read neighbouring files, run the tests, see the failure, and patch the code in a loop — which is why an agent with access to your project usually beats a single chat completion.

Tradeoffs and a worked example

The danger is that generated code looks correct. The model produces plausible syntax even when the logic is subtly wrong, and it can invent library functions that do not exist — a coding-specific form of hallucination. Treat every output as a draft from an intern: read it, run it, and test the edge cases. Concretely, ask for "a Python function to validate an email address" and most models hand you a one-line regex. It accepts a@b.c but happily rejects perfectly valid addresses like user+tag@sub.domain.co and accepts garbage like a@@b.com. The fix is to drive the spec yourself — state the input/output types, name the edge cases (empty string, unicode, plus-addressing, very long input), and ask for unit tests. The model is excellent at writing the boilerplate around a precise specification; it is poor at inventing the specification for you.

Think of it like a programming intern:

1. Write specification: "Write a function to validate email addresses — return true/false"
2. Include types and signatures: Specify input/output types, language, and expected interface
3. Add edge cases: Empty input, unicode, very long strings, concurrent access
4. AI generates code: Complete implementation with types, error handling, and docs
5. Review for subtle bugs: Check for race conditions, null checks, off-by-one errors, and missing cleanup

Where Is This Used?

Code Completion: IDE integrations like GitHub Copilot
Boilerplate Generation: Creating repetitive code structures
Language Translation: Converting code between languages
Test Generation: Writing unit tests from function signatures
Gotchas: Subtle Bugs: LLM-generated code can look correct but hide race conditions, off-by-one errors, missing error handling, or incorrect edge case logic — always review critically

Fun Fact: LLM-generated code has a dangerous property: it looks plausible at first glance. Studies show developers accept AI-generated code with less scrutiny than human-written code, yet it can contain subtle logical errors — swapped comparison operators, missing null checks, or async code that works 99% of the time but deadlocks under load.

Try It Yourself!

Use the interactive example below to see how different prompting approaches affect the quality of generated code.

5 lines

Task:Build a function that fetches user data from an API

Build your prompt — toggle features:

Your Prompt (1 parts)

Write a function to fetch user data from /api/users/:id.

Generated CodeBase version

1async function getUser(id) {

2 const res = await fetch('/api/users/' + id);

3 const data = await res.json();

4 return data;

Toggle a feature above to see how your prompt changes the generated code

Code quality:

Minimal

Key Insight

1. Specificity = quality. "Write a function" gives minimum. Each requirement improves the output.
2. If you don't ask for it, you won't get it — LLMs take the shortest path.
3. Good prompts are checklists, not essays — each toggle = one line in the prompt.

Frequently asked questions

How does the prompt affect generated code quality?

Dramatically. Specifying language, types, error handling, edge cases, and desired patterns in your prompt can shift output from prototype-quality to production-ready code with proper structure and documentation.

Should I review LLM-generated code?

Always. LLMs can produce code with subtle bugs, security vulnerabilities, or incorrect assumptions. Treat generated code as a draft from a junior developer — review, test, and validate before using.

Which programming languages do LLMs generate best?

Python, JavaScript/TypeScript, and Java have the most training data and best results. Less common languages (Rust, Haskell, Elixir) work but with more errors. Always verify syntax and idioms.

How can I use LLMs for refactoring existing code?

Provide the code with explicit instructions: 'Refactor to reduce duplication', 'Convert to async/await', 'Add TypeScript types'. Include context about patterns and constraints. Review diffs carefully.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerCoding

Write a function to filter an array of objects by multiple criteria

Without technique

Prompt

Write an array filter function.

Response

function filter(arr, key, value) { return arr.filter(item => item[key] === value); }

Tokens:12/35

Time:280ms

Quality:

With Code Generation

With technique

Prompt

Write a TypeScript function filterByMultipleCriteria that: 1. Takes an array of objects and a criteria object (key → value or array of values) 2. Supports exact match and array inclusion check 3. Returns a typed result 4. Includes JSDoc with usage example Usage example: filterByMultipleCriteria(users, { role: "admin", status: ["active", "pending"] }) Constraints: strict TypeScript, no any, cover edge case of empty criteria object.

Response

type Criteria<T> = {
  [K in keyof T]?: T[K] | T[K][];
};

/**
 * Filters an array of objects by multiple criteria.
 * A criterion can be a value (exact match) or array (inclusion check).
 *
 * @example
 * const users = [{ name: "Alice", role: "admin", status: "active" }];
 * filterByMultipleCriteria(users, { role: "admin", status: ["active", "pending"] });
 * // => [{ name: "Alice", role: "admin", status: "active" }]
 */
function filterByMultipleCriteria<T extends Record<string, unknown>>(
  items: T[],
  criteria: Criteria<T>
): T[] {
  const entries = Object.entries(criteria);
  if (entries.length === 0) return [...items];

  return items.filter((item) =>
    entries.every(([key, expected]) => {
      const actual = item[key];
      return Array.isArray(expected)
        ? expected.includes(actual)
        : actual === expected;
    })
  );
}

👁️Concrete usage example defines the expected API: key → value or array

🧠Typing and JSDoc requirement makes the model think through the generic Criteria<T> type

🔍Edge case (empty criteria) explicitly stated → model handles it instead of ignoring

Tokens:120/210

Time:980ms

Quality:

Why this works

A structured prompt with a usage example, typing constraints, and an explicit edge case transforms a one-liner without types into a production-ready generic function.

1 / 2

Practice Challenges

Create a free account to solve challenges

6 AI-verified challenges for this lesson

Related lessons:Chain Of Thought Data Generation

This lesson is part of a structured LLM course.

My Learning Path

ApplicationNew

Code Generation

Writing code with AI

The Problem: Writing code is time-consuming, and it's easy to forget syntax or make mistakes. How can AI help write code faster and more reliably?

The Solution: Your AI Junior Developer

How it works in practice

Tradeoffs and a worked example

Think of it like a programming intern:

1. Write specification: "Write a function to validate email addresses — return true/false"
2. Include types and signatures: Specify input/output types, language, and expected interface
3. Add edge cases: Empty input, unicode, very long strings, concurrent access
4. AI generates code: Complete implementation with types, error handling, and docs
5. Review for subtle bugs: Check for race conditions, null checks, off-by-one errors, and missing cleanup

Where Is This Used?

Code Completion: IDE integrations like GitHub Copilot
Boilerplate Generation: Creating repetitive code structures
Language Translation: Converting code between languages
Test Generation: Writing unit tests from function signatures
Gotchas: Subtle Bugs: LLM-generated code can look correct but hide race conditions, off-by-one errors, missing error handling, or incorrect edge case logic — always review critically

Try It Yourself!

Use the interactive example below to see how different prompting approaches affect the quality of generated code.

5 lines

Task:Build a function that fetches user data from an API

Build your prompt — toggle features:

Your Prompt (1 parts)

Write a function to fetch user data from /api/users/:id.

Generated CodeBase version

1async function getUser(id) {

2 const res = await fetch('/api/users/' + id);

3 const data = await res.json();

4 return data;

Toggle a feature above to see how your prompt changes the generated code

Code quality:

Minimal

Key Insight

1. Specificity = quality. "Write a function" gives minimum. Each requirement improves the output.
2. If you don't ask for it, you won't get it — LLMs take the shortest path.
3. Good prompts are checklists, not essays — each toggle = one line in the prompt.

Frequently asked questions

How does the prompt affect generated code quality?

Should I review LLM-generated code?

Which programming languages do LLMs generate best?

Python, JavaScript/TypeScript, and Java have the most training data and best results. Less common languages (Rust, Haskell, Elixir) work but with more errors. Always verify syntax and idioms.

How can I use LLMs for refactoring existing code?

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerCoding

Write a function to filter an array of objects by multiple criteria

Without technique

Prompt

Write an array filter function.

Response

function filter(arr, key, value) { return arr.filter(item => item[key] === value); }

Tokens:12/35

Time:280ms

Quality:

With Code Generation

With technique

Prompt

Response

type Criteria<T> = {
  [K in keyof T]?: T[K] | T[K][];
};

/**
 * Filters an array of objects by multiple criteria.
 * A criterion can be a value (exact match) or array (inclusion check).
 *
 * @example
 * const users = [{ name: "Alice", role: "admin", status: "active" }];
 * filterByMultipleCriteria(users, { role: "admin", status: ["active", "pending"] });
 * // => [{ name: "Alice", role: "admin", status: "active" }]
 */
function filterByMultipleCriteria<T extends Record<string, unknown>>(
  items: T[],
  criteria: Criteria<T>
): T[] {
  const entries = Object.entries(criteria);
  if (entries.length === 0) return [...items];

  return items.filter((item) =>
    entries.every(([key, expected]) => {
      const actual = item[key];
      return Array.isArray(expected)
        ? expected.includes(actual)
        : actual === expected;
    })
  );
}

👁️Concrete usage example defines the expected API: key → value or array

🧠Typing and JSDoc requirement makes the model think through the generic Criteria<T> type

🔍Edge case (empty criteria) explicitly stated → model handles it instead of ignoring

Tokens:120/210

Time:980ms

Quality:

Why this works

A structured prompt with a usage example, typing constraints, and an explicit edge case transforms a one-liner without types into a production-ready generic function.

1 / 2

Practice Challenges

Create a free account to solve challenges

6 AI-verified challenges for this lesson

Related lessons:Chain Of Thought Data Generation

This lesson is part of a structured LLM course.

My Learning Path