How Neural Networks work: A simple explanation
Over the last few years the terms Neural Networks have been one of the most used buzzwords among the computer and information technologies professionals. Actually, the Neural Network is an algorithm of machine learning that is a subfield of artificial intelligence. The concept of neural networks was inspired from human neuron connections. Neural networks try to mimic the neural mechanism of the human brain. In the following article the concept of neural network and how it works have been explained in detail.
Finding patterns
The primary task of a neural network is finding patterns of large amounts of data. Data works as the fuel of neural networks. A neural networks model takes inputs and predicts them according to the training it has been given using data. So, after making a model of a neural network, the primary and most important job is to train the model using a large dataset. Training is the most vital and essential phase of a neural network. But what does a neural network actually do in the training? In training, the neural network tries to find the pattern of the given data and target outputs. In each step of training, the model minimizes the errors and updates itself.
The network model
A neural network consists of many layers of neurons. Each has several neurons and all the neurons of a layer is connected to all the neurons of the next layers. The first layer is the input layer itself, the final layer is the output layers and all the layers in between are called the hidden layers. When a neural network is initialized in the first place, a random value, which is called weights, with zero normal distribution is given to each connection of neurons. All the neurons are also assigned some random values that are called biases. These weights and biases are the main focus point of a neural network. In the training stage, the model tries to minimize the cost of error by updating these weights and biases. As these weights and biases have been initialized randomly, chances are that they will give expected outputs. So in the training the model does the job of updating the weights and biases. The training phase of a neural network consists of two concepts: feedforward and backpropagation. They have been discussed in detail below.
Feedforward
The process by which inputs are taken and the output is calculated using the existing weights and biases is called feedforward. In this process, the input data is injected in the first layer and to align the model with the expected use cases, the input layer has the same number of neurons as the number of inputs. Then activation of the neurons of the next layers are calculated step by step. The activation is the final value of a neuron that is passed to the next layers. For the first layer the input itself is the activation value. To calculate the activations of a neuron, the weights of each connection from the previous layer are multiplied with the activation of the neurons of the previous layer, and added up together with the biases of the neurons of the present layer. In the next step the activation of each neuron is calculated using sigmoid function. The values we get applying the sigmoid function are then passed to the next layers and the process is repeated till the final or output layer. For each input data we do this and we predict an output. Now the next step is backpropagation that is discussed below.
Gradient Descent
Before we jump into the explanation of backpropagation, it is expedient to introduce the concept of gradient descent. There is nothing to panic about; this concept is not too hard compared to the backpropagation that we haven’t talked about yet. We need to apply the gradient descent concept when we will update the weights. Generally gradient descent can be visualized as a U-shaped curve in a graph where the horizontal line represents the weight and the vertical line represents the errors using the cost function. First we will take a point on the curve and calculate the tangent line of the point. Then we will go a little step deeper until we reach the bottom of the U-shaped curve. This process is actually called gradient descent. When we update the weight, we do the same thing and try to go to the bottom by minimizing the errors.
Backpropagation
The concept of backpropagation is among the hardest of neural networks, and a good understanding of mathematics specially calculus, matrices, linear algebra are essential. Most people find the concept very difficult to grasp, but with due concentration and visualizing the larger image, one can understand this concept well. The first step of backpropagation is to get the error calculated by subtracting the predicted output from the target outputs. We then update the weights and biases of each layer step by step using the error. As there are many weights and steps in a neuron, updating the weights seems very difficult. This is the step where the calculus comes into play. We need to find the derivation of the error in respect of the weights of each connection. But as there are many steps in between, we will apply the chain rule of derivative. In order to do that first we will find out the derivative of the error of each neuron of the output layer in respect of the activation of the neuron. After that the derivative of activation in respective of the net weight-multiplied-activation. Finally the derivative of the net weight-multiplied-activation in respect of the weights connected to that neuron is calculated. Then by multiplying all the three derivatives, the actual derivative of the error in respective of the weight is found. In the next and final step the weight is updated by subtracting the derivative times learning rate. Here learning rate refers to the distance of step we take in each training. By taking large learning rates, we can minimize the error quickly but there are chances to miss the lowest point of our gradient descent. Whereas too little steps may end up taking too much time for the training. So a medium learning rate, using the trial and error method, should be taken for better performances. We keep updating the weights for all the layers, and as we finish updating all the layers we move on to the next input and repeat the process.
Prediction
This is the stage where we practically use the neural network model in our work. We give inputs to the model and the model predicts output based on the latest weights and biases. Now we can put these weights and biases in the hard coded program to predict something on devices with low capacity. In such cases we will no longer be able to update the weights and biases for further training.
No comments