Mental Model
Think of a Neural Network as a line of layers, where every neuron in a layer is wired to the ones in the layer before it. Every single one of those wires (connections) has a weight attached to it.
These weights are basically the “knobs” that control the final output when u feed the network an input.
When u start, all these weights are totally random (the network knows nothing). The goal of backpropagation is to tweak/update these weights, in some way, such that they fit your training samples as much as possible.
The Structure of a Neural Network
The Diagram
(the neurons don’t contain a bias term)
Link to original
Notation Key
- the training output vector
- the layer index.
- the sigma vector of layer .
Using Stochastic/Online Gradient Descent
This of course is done for every training sample, and runs for a few epochs…
Using Batch Gradient Descent
This of course runs for a few epochs…
contains the weight update matrix of every layer.
(e.g, this is , or the weight update matrix of layer 2).