Calculating the gradients and subtracting from the weights is called Back propagation
for freezing - back propagation happens only for the new layer, not for all the layers until we specifically ask it to.
Discriminative learning rate
assigning different learning rate in different layers
more useful for transfer learning where the earlier layers have a little less learning rate as compared to forward layers - because we dont want to mess up with the weights of earlier layers as they have been already trained better.
in the fit function - when we are using slice function for the learning rate - fastai will automatically create 3 learning groups ( learning rate is not individually applied per layer but to a layer group)
In the example of collaborative filtering - the operation with and without one-hot-encoding are mathematically identical.
Embedding
means - look something up in an array
It is a fast and memory friendly way of multiplying with one-hot-encoding matrix
CollabDataBunch
Collaborative filtering is done through matrix factorization. We just learned what it is.
collab_learner class has an argument n_factors
which is the width of the embedding matrix
Bias is not supposed to be always 1. It should be some number
users and items is the nomenclature used for collaborative filtering - in the movie example, item is the movie
PCA - picking the most important vectors from the matrix