Sunday, August 28, 2016

A High Level Guide to Artificial Neural Networks



Today let’s try to have a high level view of Artificial Neural Networks, one of the most powerful machine learning methods that exist and rabbit hole that leads to the wonderland of deep learning, which is arguably the hottest area in computer science that exists today.

So let’s get straight to it. A typical architecture of an ANN looks like this:



As you can see, it consists of stacked layers of perceptrons each having a non-linear activation function. Each perceptron or a neuron is a computational unit that takes in input $x_{1}, x_{2}, x_{3} ..$ and 1 (the bias input) and outputs $f(W^{T}X)$ where $W$ is its corresponding weight vector and $f$ is the chosen activation function. 




This is a crude simulation of a biological neuron which receives inputs from various regions through its dendrites and fires when the activation value is greater than some threshold value. The inputs are weighted in the synaptic junction, and these weights which are dynamically changing, adjusting, are the learnable parameters of the neuron.

The first layer is an input layer and the rightmost layer is the output layer. The layers in between are called ‘hidden layers’ and their number is a hyper parameter that we choose beforehand depending upon the complexity of the model we wish to build. Neural networks can form very complex decision boundaries as they consist of several layers of non-linearity in the form of these hidden layers.

All the neurons in one layer are connected to the ones in the next with some weights that we wish to learn. The learning procedure starts with randomly initializing all the weights with small random numbers. The rest of the procedure consists of successive forward passes in which input layer takes in the training data and each neuron in the layer computes its respective value and feeds it to the hidden layers and the process is continued all the way up to the output layer. Once output layer has its prediction values, loss is computed choosing a suitable loss function and is backpropogated towards the input side. The process of backpropogation is basically application of chain rule across each neuron as gradients of objective function with respect to the weight vectors are updated by multiplying the local gradients with the incoming gradient. With the new updated gradients the weight vectors are also updated by gradient descent method that we discussed in the earlier posts. With the new weight vectors forward propagation happens again and the same procedure reiterates. This is a very high level and informal description of what goes on in the neural nets. For concrete mathematical formulation I recommend you to go through this link and also this.

Deep learning which has shown mindblowing results in the past few years is a field built upon ANNs. It basically deals with modelling neural networks in a way that can be scalable to large number of hidden layers. I will try to cover some of these cool models in my future posts. :-) 




No comments:

Post a Comment