Working Notes: a commonplace notebook for recording & exploring ideas.
Home. Site Map. Subscribe. More at expLog. — Kunal

Deep Learning Systems

To save some time, I'm going to speed read the slides / reading first, and then watch videos if it makes sense to me.

Videos:

Lecture 1

each model is basically
- hypothesis with parameters mapping inputs to outputs
- loss function specifying how well a hypothesis performs
- optimization method for determining a set of parameters that approximately minimizes sum of losses on training set
linear hypothesis function uses a linear operator == matmul for transformation
classification error
- simplest loss function
- 0 if argmax(h(x)) == y, 1 otherwise
softmax / cross entropy loss
- z_i = p(label = i) = e(h(x)) / (sum e(h(x)))
- loss to be the negative log probability of the true class = -log p (label=y) = -hy(x) + log(sum e(h(x)))

numerical differentiation -- approximate the gradient,
- not very efficient, and suffers from numerical errors
Automatic Differentiation
good way to check auto differentiation
- differentiate in reverse topological order
partial adjoints for handling multiple pathways