Artificial Intelligence, Machine Learning, Deep Learning, and Neural Networks are all buzzwords right now. This article is the first among the series of planned articles which focuses on understanding and implementing the concepts of deep learning and artificial intelligence.
“Artificial Intelligence is the new electricity. Similar to how electricity revolutionized a number of industries about hundred years ago, Artificial Intelligence will transform and revolutionize different industries” - Andrew NG
Artificial Intelligence and Deep Learning are becoming one of the most important components of modern day businesses. A large number of smart and intelligent systems are regularly built to solve the use cases that were earlier thought to be complex to solve. Some examples include automatic speech recognition in smartphones, conversation chatbots, image classification and clustering in search engines, natural language generation, and understanding.
At a broader level, Artificial Intelligence and Deep Learning aim to create intelligent machines. However, at a much deeper level, it comprises of Mathematical relationships, sophisticated optimization algorithms, and models that generate intelligence in the form of predictions, segmentation, clustering, forecasting, classifications etc.
Neural Networks are the building blocks of every Artificial Intelligence and Deep Learning models. In this artificial intelligence neural network tutorial, we will understand everything about the neural networks and the science behind them.
A neural network is a mathematical model that is designed to function similar to biological neurons and nervous system. These models are used to recognize complex patterns and relationships that exist within a labeled data. A labeled dataset contains two types of variables - predictors (some features which are used as independent variables in the model) and target (some features which are treated as dependent variables of the model).
The core structure of a Neural Network model is comprised of a large number of simple processing nodes which are interconnected and organized in different layers. An individual node in a layer is connected to several other nodes in the previous and the next layer. The inputs from one layer are received and processed to generate the output which is passed on to the next layer.
The first layer of this architecture is often named as input layer which accepts the inputs, the last layer is named as the output layer which produces the output and every other layer between input and output layer is named as hidden layers. Let us now understand how to apply artificial neural network and how it works.
A neuron is the smallest unit in a neural network. It accepts various inputs, applies a function and computes the output.
Each incoming connection corresponds to different inputs to a node, To each of its connections, the node assigns a number known as a “weight”. The weight of every input variable signifies the importance or the priority of that variable among all other variables. For different values of input variables, the node multiplies the value with its associated weight and adds the resulting products together. It also adds another term called “bias” which helps the learning function to adjust to left or right. This summed number is then passed through an activation function (described in next section) which maps the inputs to the target output values.
Let’s understand this through an example. Take a situation in which you want to purchase a new house and you will make the decision based on the following factors in order of priority.
The best way to formalize the decision making based on this situation is to formulate a mathematical equation with:
Inputs: x1,x2, x3, …
Weights: w1, w2, w3, …
Node Bias term: “b”
Node Input (Z) = w1*x1 + w2*x2 + w3*x3 + w4*x4 + … + w7*x7 + b
Node Output (A) = g (Z)
Here the function “g” is known as Activation Function. Let’s understand how this activation function works.
The main goal of an activation function is to apply a non-linear transformation on the input to map it to the output. For example, a linear combination of 7 variables related to the house is mapped into two target output classes: “Buy the Property” and “Do not buy the Property”.
The decision boundary of the output can be given by a threshold value. If the generated value is below a threshold value, the node outputs 0, otherwise 1. The generated outputs (0, 1) belongs to the decision. In our example, A is the generated output (Activation), and “b” is the threshold. If weighted sum of inputs and their weights comes out to be greater than “b” then you will buy the property otherwise not. i.e.
If A > 0, then interpretation = “buy house”
else A < 0, then interpretation = “do not buy”
WX + b > 0, 1(“buy house”)
WX + b < 0, 0(“do not buy”)
This is the basic equation of the activation function which is applied to each neuron. In the above example, we have applied Step Function as the activation function.
There are other choices of nonlinear activation functions such as relu, sigmoid, tanh:
Neural Network model goes through the process called forward propagation in which it forward propagates the inputs and their activations through the layers to get the final output. The computation steps involved are:
Z = W*X + b
A = g(Z)
g is the activation function
A is the predicted output using the input variables X.
W is the weight matrix
B is the bias matrix
To optimize the weights and bias matrices used in the neural network, the model computes the error in the output and makes small changes in the values. These changes are made so that the overall error can be reduced. This process of error computation and weights optimization is represented in functions called loss function and cost function.
Loss function measures the error in the prediction and the actual value. One simplistic loss function is the difference between actual value and the predicted value.
Loss = Y - A
Y = Actual Value, A = Predicted Value
Cost function computes the summation of loss function values for every training data example.
Cost Function = Summation (Loss Function)
The final step is to minimize the error term (cost function) and obtain the optimal values of weights and bias terms. A neural net model does this through a process called backpropagation using gradient descent.
In this process, it computes the error of the final layer and back passes it to the previous layer. The previous layer associated weights and biases are adjusted to tackle the error. The values of weights and bias are updated using the process called gradient descent. In this algorithm, the derivative of error in the final layer is computed with respect to each weight. This error derivative is then used to find the derivative of weights and bias which are then subtracted from the original values to get the updated new values.
Then, the model again forward propagate to compute the new error with new weights and then will backward propagate to update the weights again. The process is repeated several times to achieve the minimum error term. This process is also termed as training. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.
To train a neural network model, following steps are implemented.
In this article, we discussed the overall working of an AI neural network model. In the next article, we will discuss how to implement a neural network in Python. Feel free to share your comments.