Neurons and gradient descent

Neural nets are interconnected networks of simple processing units, a.k.a neurons
It remains just a parallel, the artificial neuron is just an approximation! The output y was modelled as a weighted linear combination of inputs xj. Moreover, a transfer function, or activation function was used to make a decision.
Gradient descent–how to find the right weights?
We want to find the minimum of an error function
We start with an initial guess of the parameter b
We change its value in the direction of maximal slope (导数Derivatives are slopes)
We continue until reaching a minimum
Steps
To update a weight b, we remove to its value the derivative
a is called a learning rate»It decides by how much you multiply the step vector»Large a can lead to faster convergence than small» too large a can lead to disaster
r is the iteration
Why the minus sign? Because we want to move towards a minimum!
Momentum
Numerical example
Learning rate controls oscillation and speed Momentum uses a bit of the previous step
the models using them multi-layer perceptron convolutional neural networks ( deep learning)
Multilayer perceptron (MLP)
(from linear classifier to nonlinear version)
It’s a feed forward network: it goes from inputs to outputs, without loops
Every neuron includes an activation function (e.g. sigmoid, see earlier slides)
Chain rule
How to compute gradient descent on a cascade级联 of operations?
How to reach all trainable weights?
1. The forward pass: you put a data point in and obtain the prediction
2. The Backward pass: you update parameters by back-propagating errors exercise Describe the architecture of the following network. ● How many layers are there? 2 ● How many neurons per layer? 3,1 ● How many trainable parameters (weights and biases)?10
  Convolutional Neural Networks(CNN)
they proposed a network learning spatial filters on top of spatial filters. They were (and are) called convolutional neural networks
The CNN was considered interesting, but very hard to train. It needed● Loads of training data > nobody had them● A lot of computational power > same
Change
big data graphic processing
How to work
They are conceptually very similar to MLPs
But their weights (b) are 2D convolutional filters
For this, they are very well suited for images
In convolutional neural networks, the filters are spatial (on 2D grids). ● local : they convolve the values of the image in a local window● shared: the same filter is applied everywhere in the image: why shared? “Recycling” the same operation allows to have much less parameters to learn!
Learn
By “learns”, I mean adapt filters weights to minimize prediction error (i.e. backpropagation)
Steps
Convolutional filters start with random numbers
Iteratively they are improved: each coefficient is updated in the direction of largest gradient of the cost function
At the end, the filters become quite meaningful!
Summary
Perceptrons are “neuron-inspired” linear discriminants
Multilayer perceptrons are trainable, nonlinear discriminants
Feed-forward neural networks in general can be used for classification, regression and feature extraction
There is a large body of alternative neural nets
Key problems in the application of ANNs are choosing the right size and good training parameters
Convolutional neural networks have a constrained architecture encoding prior knowledge of the problem
Deep learning is concerned with constructing extremely large neural networks that depend on:● special hardware (GPUs), to be able to train them● specific tricks (e.g. rectified linear units) to prevent overfitting