Neural Networks
How billions of simple calculations create intelligence
The Problem: You keep hearing "neural network" everywhere, but what actually IS a neuron in AI? How do billions of simple calculations combine to generate human-like text? Understanding neural networks is the foundation for understanding why LLMs work — and why they sometimes don't.
The Solution: How Neural Networks Work
A neural network is a system of interconnected artificial neurons organized in layers. Each neuron computes a weighted sum of its inputs, adds a bias, and passes the result through an activation function (like ReLU or sigmoid). The key insight: without nonlinear activation functions, stacking 100 layers would collapse into a single linear transformation — activation functions are what make "deep" learning possible. During training, backpropagation uses the chain rule to compute how much each of billions of weights contributed to the error, then adjusts them via gradient descent.
Think of it like a factory assembly line — each worker (neuron) does one simple task like adding up inputs and deciding yes or no, but 96 stations of workers in a chain, each refining the previous output, collectively turn raw text into meaningful predictions:
- 1. Input layer receives data: Your prompt gets converted into numbers (via tokenization and embeddings) and fed into the input layer. In an LLM, this means token IDs become embedding vectors — each number represents a feature of the input
- 2. Hidden layers transform representations: Each hidden layer applies weights, biases, and activation functions to transform the data. GPT-4 has ~120 such layers. Early layers detect simple patterns (grammar, common phrases); deeper layers capture meaning, context, and reasoning patterns
- 3. Output layer produces probabilities: The final layer uses softmax to convert raw scores (logits) into a probability distribution over the entire vocabulary (~100K tokens). The model then samples the next token based on these probabilities — this is where temperature affects output
- 4. Backpropagation trains the network: During training, the error between predicted and actual next token flows backward through all layers. The chain rule computes gradients for every weight, and gradient descent adjusts them. Repeat billions of times across trillions of tokens — that is how an LLM learns
Neural Networks in LLMs
- Text Generation: Every LLM is a neural network. GPT-4 has ~1.8 trillion parameters (weights on connections between neurons) organized in ~120 layers. When you chat with ChatGPT, your prompt flows through all these layers in a forward pass, and the softmax at the output produces the next token
- Temperature Control: When you adjust temperature in LLM settings, you are modifying the softmax activation function at the output layer. Temperature=0 makes the highest-probability token almost certain; temperature=2 flattens the distribution, making output more creative but less predictable
- Model Size & Parameters: When people say a model has "7 billion parameters," they mean 7 billion learnable weights on the connections between neurons. More parameters = more capacity to store patterns, but also more compute and memory needed. Llama 3 70B has 70 billion weights across 80 layers
- Common Pitfall: Neural networks are not "thinking" like humans. Each forward pass is a fixed mathematical computation — no reasoning happens inside. The illusion of intelligence emerges from patterns learned across trillions of tokens during training via backpropagation
Fun Fact: GPT-3 was trained with backpropagation across 175 billion parameters using 300 billion tokens of text. The same algorithm invented in 1986 — just applied at an unimaginable scale. A simple 3-4-2 network has 14 weights; GPT-4 has roughly 1.8 trillion.
Try It Yourself!
Explore the interactive neural network below: adjust inputs, watch the forward pass, toggle activation functions, and see how backpropagation adjusts weights.
This network decides if an email is spam. Adjust the inputs — how "spammy" is this email?
Describe the email (0 = no, 1 = definitely):
Try it: Set "Win $$$" and "Suspicious links" to max, "Known sender" to 0. The network classifies it as spam. Now flip them — known sender high, the rest low. Gmail does the same, just with millions of inputs instead of 3.
Total parameters: This tiny network has 3x4 + 4 + 4x2 + 2 = 26 parameters. GPT-4 has ~1.8 trillion. Same principle, vastly different scale.
Try it yourself
Interactive demo of this technique
Get a clear explanation of neural networks for a non-technical audience
Neural networks are machine learning models inspired by the brain. They consist of nodes and connections. They are used in deep learning for various tasks such as pattern recognition and text processing.
One neuron:
- Receives inputs (x1=0.7, x2=0.3, x3=0.9)
- Multiplies by weights (w1=0.5, w2=-0.3, w3=0.8)
- Sums: 0.70.5 + 0.3(-0.3) + 0.9*0.8 = 0.98
- Applies ReLU: max(0, 0.98) = 0.98 → passes forward
GPT-4 scale:
- ~1.8 trillion such operations
- ~120 layers (like 120 assembly stations)
- Each layer refines understanding: early ones — grammar, deep ones — meaning
- Softmax at output gives probabilities over ~100K tokens
Analogy: Assembly line with 120 stations. Each worker does one simple operation, but a chain of 1.8 trillion workers transforms raw text into a meaningful response.
A "simple to complex" prompt (one neuron → billions) with specific numbers and an analogy produces a structured, memorable answer instead of an abstract description.
Create a free account to solve challenges
3 AI-verified challenges for this lesson
This lesson is part of a structured LLM course.
My Learning Path