Backpropagation explained by coding it

Backpropagation is the algorithm that trains every neural network, and it has a reputation for being hard. It is not. It is the chain rule from calculus, applied backward through a chain of operations. The fastest way to understand it for good is to code a tiny version yourself.

The one idea

Training means adjusting weights to reduce a loss. To adjust a weight, you need to know how the loss changes when that weight changes, which is its gradient. Backprop is just an efficient way to compute every weight's gradient by working backward from the loss.

Forward pass: data flows in, predictions and loss come out. Backward pass: gradients flow back, from the loss to every parameter.

The chain rule, concretely

If loss depends on a, and a depends on w, then how much the loss changes with w is the product of the local changes:

d(loss)/d(w) = d(loss)/d(a) * d(a)/d(w)

That is the whole trick. Each operation knows its own local derivative; backprop multiplies them along the path from the loss back to each weight.

A tiny worked version

Take one neuron: z = w*x + b, then a = sigmoid(z), then a loss comparing a to the target y. The backward pass computes, in order:

# forward
z = w * x + b
a = 1 / (1 + math.exp(-z))
loss = (a - y) ** 2

# backward (chain rule, step by step)
dloss_da = 2 * (a - y)
da_dz    = a * (1 - a)        # derivative of sigmoid
dz_dw    = x
grad_w   = dloss_da * da_dz * dz_dw
grad_b   = dloss_da * da_dz   # dz/db = 1

Then you nudge the weights down their gradients (w -= lr * grad_w) and repeat. Scale this from one neuron to a layer to a network, and you have exactly what frameworks do, just with more bookkeeping.

Why coding it matters

Once you have written the backward pass by hand, a lot of deep learning stops being mysterious: why gradients can vanish (many small derivatives multiplied together), why activation choice matters (its derivative is a factor in every gradient), and what an optimizer is actually doing. You will read framework code and recognize the machinery instead of trusting it blindly.

Build the whole thing

The AI and Deep Learning track takes you from a single neuron through hand-coded backprop to training real networks, all built from scratch and graded in your browser, ending in a tiny GPT you backpropagate yourself. The first project is free.

Code the backward pass once, and neural networks will never look like magic again.