A tiny network starts knowing nothing — its decision boundary is a random smear. Then gradient descent goes to work: backprop, a loss that falls, a boundary that bends to fit the data. Real backpropagation, hand-written in numpy, no frameworks. Pick a shape and watch it figure out the pattern.