Notes

Calculating the gradients and subtracting from the weights is called Back propagation

for freezing - back propagation happens only for the new layer, not for all the layers until we specifically ask it to.

Discriminative learning rate

assigning different learning rate in different layers

more useful for transfer learning where the earlier layers have a little less learning rate as compared to forward layers - because we dont want to mess up with the weights of earlier layers as they have been already trained better.

in the fit function - when we are using slice function for the learning rate - fastai will automatically create 3 learning groups ( learning rate is not individually applied per layer but to a layer group)

first group
middle group
last group

In the example of collaborative filtering - the operation with and without one-hot-encoding are mathematically identical.

Embedding

means - look something up in an array

It is a fast and memory friendly way of multiplying with one-hot-encoding matrix

CollabDataBunch

Collaborative filtering is done through matrix factorization. We just learned what it is.

collab_learner class has an argument n_factors

which is the width of the embedding matrix

Bias is not supposed to be always 1. It should be some number

users and items is the nomenclature used for collaborative filtering - in the movie example, item is the movie

PCA - picking the most important vectors from the matrix