Today let’s try to have a high level view of Artificial
Neural Networks, one of the most powerful machine learning methods that exist
and rabbit hole that leads to the wonderland of deep learning, which is
arguably the hottest area in computer science that exists today.
So let’s get straight to it. A typical architecture of an ANN looks like this:
As you can see, it consists of stacked layers of perceptrons each
having a non-linear activation function. Each perceptron or a neuron is a
computational unit that takes in input $x_{1}, x_{2}, x_{3} ..$ and 1 (the bias
input) and outputs $f(W^{T}X)$ where $W$ is its corresponding weight vector and
$f$ is the chosen activation function.
This is a crude simulation of a
biological neuron which receives inputs from various regions through its
dendrites and fires when the activation value is greater than some threshold
value. The inputs are weighted in the synaptic junction, and these weights
which are dynamically changing, adjusting, are the learnable parameters of the
neuron.
The first layer is an input layer and the rightmost layer is
the output layer. The layers in between are called ‘hidden layers’ and their
number is a hyper parameter that we choose beforehand depending upon the
complexity of the model we wish to build. Neural networks can form very complex
decision boundaries as they consist of several layers of non-linearity in the
form of these hidden layers.
All the neurons in one layer are connected to the ones in
the next with some weights that we wish to learn. The learning procedure starts
with randomly initializing all the weights with small random numbers. The rest
of the procedure consists of successive forward passes in which input layer
takes in the training data and each neuron in the layer computes its respective
value and feeds it to the hidden layers and the process is continued all the
way up to the output layer. Once output layer has its prediction values, loss
is computed choosing a suitable loss function and is backpropogated towards the
input side. The process of backpropogation is basically application of chain
rule across each neuron as gradients of objective function with respect to the
weight vectors are updated by multiplying the local gradients with the incoming
gradient. With the new updated gradients the weight vectors are also updated by
gradient descent method that we discussed in the earlier posts. With the new
weight vectors forward propagation happens again and the same procedure
reiterates. This is a very high level and informal description of what goes on
in the neural nets. For concrete mathematical formulation I recommend you to go
through this link and also this.
Deep learning which has shown mindblowing results in the past few years is a field built upon ANNs. It basically deals with modelling neural networks in a way that can be scalable to large number of hidden layers. I will try to cover some of these cool models in my future posts. :-)