Lesson 1

Neural Networks

How billions of simple calculations create intelligence

The Problem: You keep hearing "neural network" everywhere, but what actually IS a neuron in AI? How do billions of simple calculations combine to generate human-like text? Understanding neural networks is the foundation for understanding why LLMs work — and why they sometimes don't.

The Solution: How Neural Networks Work

A neural network is a system of interconnected artificial neurons organized in layers. Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function (like ReLU or sigmoid). The key insight: without nonlinear activation functions, stacking 100 layers would collapse into a single linear transformation — activation functions are what make "deep" learning possible. During training, backpropagation uses the chain rule to compute how much each of billions of weights contributed to the error, then adjusts them via gradient descent.

Think of it like a factory assembly line — each worker (neuron) does one simple task like adding up inputs and deciding yes or no, but 96 stations of workers in a chain, each refining the previous output, collectively turn raw text into meaningful predictions:

1. Input layer receives data: Your prompt gets converted into numbers (via tokenization and embeddings) and fed into the input layer. In an LLM, this means token IDs become embedding vectors — each number represents a feature of the input
2. Hidden layers transform representations: Each hidden layer applies weights, biases, and activation functions to transform the data. GPT-4 has ~120 such layers. Early layers detect simple patterns (grammar, common phrases); deeper layers capture meaning, context, and reasoning patterns
3. Output layer produces probabilities: The final layer uses softmax to convert raw scores (logits) into a probability distribution over the entire vocabulary (~100K tokens). The model then samples the next token based on these probabilities — this is where temperature affects output
4. Backpropagation trains the network: During training, the error between predicted and actual next token flows backward through all layers. The chain rule computes gradients for every weight, and gradient descent adjusts them. Repeat billions of times across trillions of tokens — that is how an LLM learns

Neural Networks in LLMs

Text Generation: Every LLM is a neural network. GPT-4 has ~1.8 trillion parameters (weights on connections between neurons) organized in ~120 layers. When you chat with ChatGPT, your prompt flows through all these layers in a forward pass, and the softmax at the output produces the next token
Temperature Control: When you adjust temperature in LLM settings, you are modifying the softmax activation function at the output layer. Temperature=0 makes the highest-probability token almost certain; temperature=2 flattens the distribution, making output more creative but less predictable
Model Size & Parameters: When people say a model has "7 billion parameters," they mean 7 billion learnable weights on the connections between neurons. More parameters = more capacity to store patterns, but also more compute and memory needed. Llama 3 70B has 70 billion weights across 80 layers
Common Pitfall: Neural networks are not "thinking" like humans. Each forward pass is a fixed mathematical computation — no reasoning happens inside. The illusion of intelligence emerges from patterns learned across trillions of tokens during training via backpropagation

Fun Fact: GPT-3 was trained with backpropagation across 175 billion parameters using 300 billion tokens of text. The same algorithm invented in 1986 — just applied at an unimaginable scale. A simple 3-4-2 network has 14 weights; GPT-4 has roughly 1.8 trillion.

Try It Yourself!

Explore the interactive neural network below: adjust inputs, watch the forward pass, toggle activation functions, and see how backpropagation adjusts weights.

Neural Network: From Neuron to LLM

This network decides if an email is spam. Adjust the inputs — how "spammy" is this email?

Describe the email (0 = no, 1 = definitely):

"Win $$$" words

0.70

Known sender

0.30

Suspicious links

0.90

Hover on a line to see its weight

Try it: Set "Win $$$" and "Suspicious links" to max, "Known sender" to 0. The network classifies it as spam. Now flip them — known sender high, the rest low. Gmail does the same, just with millions of inputs instead of 3.

Total parameters: This tiny network has 3x4 + 4 + 4x2 + 2 = 26 parameters. GPT-4 has ~1.8 trillion. Same principle, vastly different scale.

Frequently asked questions

What is an artificial neuron and how does it work?

An artificial neuron takes multiple inputs, multiplies each by a weight, sums them up, and passes the result through an activation function (like ReLU or sigmoid). This produces a single output. One neuron makes one simple decision, but billions of them connected in layers create the intelligence behind LLMs like GPT-4.

Why are activation functions important in neural networks?

Without nonlinear activation functions, stacking 100 layers would mathematically collapse into a single linear transformation. Activation functions like ReLU introduce nonlinearity, allowing the network to learn complex patterns. Softmax at the output layer converts raw scores into probability distributions — this is how LLMs choose the next token.

What does backpropagation do in LLM training?

Backpropagation computes how much each weight contributed to the prediction error using the chain rule of calculus. The error signal flows backward through all layers, and each weight is adjusted to reduce future errors. GPT-3 was trained this way across 175 billion parameters using 300 billion tokens of text.

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Get a clear explanation of neural networks for a non-technical audience

Without technique

Prompt

Explain neural networks

Response

Neural networks are machine learning models inspired by the brain. They consist of nodes and connections. They are used in deep learning for various tasks such as pattern recognition and text processing.

Tokens:4/38

Time:320ms

Quality:

With fundamentals-neural-networks

With technique

Prompt

Explain step by step how a single artificial neuron works, then show how connecting billions of such neurons in 96 layers creates GPT-4's ability to generate text. Use specific numbers and a factory assembly line analogy.

Response

One neuron:

Receives inputs (x1=0.7, x2=0.3, x3=0.9)
Multiplies by weights (w1=0.5, w2=-0.3, w3=0.8)
Sums: 0.70.5 + 0.3(-0.3) + 0.9*0.8 = 0.98
Applies ReLU: max(0, 0.98) = 0.98 → passes forward

GPT-4 scale:

~1.8 trillion such operations
~120 layers (like 120 assembly stations)
Each layer refines understanding: early ones — grammar, deep ones — meaning
Softmax at output gives probabilities over ~100K tokens

Analogy: Assembly line with 120 stations. Each worker does one simple operation, but a chain of 1.8 trillion workers transforms raw text into a meaningful response.

👁️Basic prompt "explain neural networks" yields generic phrases without specifics

🧠Structured prompt sets a path: one neuron → scale → analogy. Specific numbers (96 layers, GPT-4) force AI to be precise

✅Result: step-by-step explanation with concrete calculations that immediately helps understand LLMs

Tokens:42/185

Time:890ms

Quality:

Why this works

A "simple to complex" prompt (one neuron → billions) with specific numbers and an analogy produces a structured, memorable answer instead of an abstract description.

1 / 2

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

This lesson is part of a structured LLM course.

My Learning Path

Lesson 1

Neural Networks

How billions of simple calculations create intelligence

The Solution: How Neural Networks Work

Think of it like a factory assembly line — each worker (neuron) does one simple task like adding up inputs and deciding yes or no, but 96 stations of workers in a chain, each refining the previous output, collectively turn raw text into meaningful predictions:

1. Input layer receives data: Your prompt gets converted into numbers (via tokenization and embeddings) and fed into the input layer. In an LLM, this means token IDs become embedding vectors — each number represents a feature of the input
2. Hidden layers transform representations: Each hidden layer applies weights, biases, and activation functions to transform the data. GPT-4 has ~120 such layers. Early layers detect simple patterns (grammar, common phrases); deeper layers capture meaning, context, and reasoning patterns
3. Output layer produces probabilities: The final layer uses softmax to convert raw scores (logits) into a probability distribution over the entire vocabulary (~100K tokens). The model then samples the next token based on these probabilities — this is where temperature affects output
4. Backpropagation trains the network: During training, the error between predicted and actual next token flows backward through all layers. The chain rule computes gradients for every weight, and gradient descent adjusts them. Repeat billions of times across trillions of tokens — that is how an LLM learns

Neural Networks in LLMs

Text Generation: Every LLM is a neural network. GPT-4 has ~1.8 trillion parameters (weights on connections between neurons) organized in ~120 layers. When you chat with ChatGPT, your prompt flows through all these layers in a forward pass, and the softmax at the output produces the next token
Temperature Control: When you adjust temperature in LLM settings, you are modifying the softmax activation function at the output layer. Temperature=0 makes the highest-probability token almost certain; temperature=2 flattens the distribution, making output more creative but less predictable
Model Size & Parameters: When people say a model has "7 billion parameters," they mean 7 billion learnable weights on the connections between neurons. More parameters = more capacity to store patterns, but also more compute and memory needed. Llama 3 70B has 70 billion weights across 80 layers
Common Pitfall: Neural networks are not "thinking" like humans. Each forward pass is a fixed mathematical computation — no reasoning happens inside. The illusion of intelligence emerges from patterns learned across trillions of tokens during training via backpropagation

Try It Yourself!

Explore the interactive neural network below: adjust inputs, watch the forward pass, toggle activation functions, and see how backpropagation adjusts weights.

Neural Network: From Neuron to LLM

This network decides if an email is spam. Adjust the inputs — how "spammy" is this email?

Describe the email (0 = no, 1 = definitely):

"Win $$$" words

0.70

Known sender

0.30

Suspicious links

0.90

Hover on a line to see its weight

Total parameters: This tiny network has 3x4 + 4 + 4x2 + 2 = 26 parameters. GPT-4 has ~1.8 trillion. Same principle, vastly different scale.

Frequently asked questions

What is an artificial neuron and how does it work?

Why are activation functions important in neural networks?

What does backpropagation do in LLM training?

Try it yourself

Interactive demo of this technique

Technique Comparison

Demo Mode

Pre-recorded responses

TaskBeginnerAnalysis

Get a clear explanation of neural networks for a non-technical audience

Without technique

Prompt

Explain neural networks

Response

Tokens:4/38

Time:320ms

Quality:

With fundamentals-neural-networks

With technique

Prompt

Response

One neuron:

Receives inputs (x1=0.7, x2=0.3, x3=0.9)
Multiplies by weights (w1=0.5, w2=-0.3, w3=0.8)
Sums: 0.70.5 + 0.3(-0.3) + 0.9*0.8 = 0.98
Applies ReLU: max(0, 0.98) = 0.98 → passes forward

GPT-4 scale:

~1.8 trillion such operations
~120 layers (like 120 assembly stations)
Each layer refines understanding: early ones — grammar, deep ones — meaning
Softmax at output gives probabilities over ~100K tokens

Analogy: Assembly line with 120 stations. Each worker does one simple operation, but a chain of 1.8 trillion workers transforms raw text into a meaningful response.

👁️Basic prompt "explain neural networks" yields generic phrases without specifics

🧠Structured prompt sets a path: one neuron → scale → analogy. Specific numbers (96 layers, GPT-4) force AI to be precise

✅Result: step-by-step explanation with concrete calculations that immediately helps understand LLMs

Tokens:42/185

Time:890ms

Quality:

Why this works

A "simple to complex" prompt (one neuron → billions) with specific numbers and an analogy produces a structured, memorable answer instead of an abstract description.

1 / 2

Practice Challenges

Create a free account to solve challenges

3 AI-verified challenges for this lesson

This lesson is part of a structured LLM course.

My Learning Path