Mental Model

Think of a Neural Network as a line of layers, where every neuron in a layer is wired to the ones in the layer before it. Every single one of those wires (connections) has a weight attached to it.

These weights are basically the “knobs” that control the final output when u feed the network an input.

When u start, all these weights are totally random (the network knows nothing). The goal of backpropagation is to tweak/update these weights, in some way, such that they fit your training samples as much as possible.

The Structure of a Neural Network

The Diagram

(the neurons don’t contain a bias term)

Link to original

Notation Key

  • the training output vector
  • the layer index.
  • the sigma vector of layer .

Using Stochastic/Online Gradient Descent

This of course is done for every training sample, and runs for a few epochs…

Using Batch Gradient Descent

This of course runs for a few epochs…
contains the weight update matrix of every layer.
(e.g, this is , or the weight update matrix of layer 2).


Connections