ch-2 How the backpropagation algorithm works
discuss how to compute the gradient ( ) of the cost function
Heart of backpropagation
- an expression for the partial derivative of the cost function C with respect to any weight w ( or bias b ) in the network
- tells us how quickly the cost changes when
we change the weights and biases
Warm up: a fast matrix-based approach to computing the output from a neural network
Notation
Vectorized form
- apply the weight matrix to the activations, then add the bias vector, and finally apply the σ function
- a^l=σ(w^la^{l−1}+b^l) =σ(z^l)
The two assumptions we need about the cost function
- the cost function can be written as an average $ C=\frac{1}{n}∑_xC_x$ over cost functions for individual training examples, x.
- the cost can be written as a function of the outputs from the neural network:
The four fundamental equations behind backpropagation
**DEF error ** $ δ^l_j $
Error in the output layer
Error in terms of the error in the next layer,
Rate of change of the cost with respect to any bias
Rate of change of the cost with respect to any weight
Proof of the four fundamental equations
using chain rule
The backpropagation algorithm
- correctness: because the cost is a function of outputs
- To understand how the cost varies with earlier weights and biases we need to repeatedly apply the chain rule, , working backward through the layers to obtain usable expressions
Backpropagation: the big picture
a clever way of keeping track of small perturbations to the weights (and biases) as they propagate through the network, reach the output, and then affect the cost