top

Understanding The Essential Blocks of Artificial Intelligence - Neural Networks

Artificial Intelligence, Machine Learning,  Deep Learning, and Neural Networks are all buzzwords right now. This article is the first among the series of planned articles which focuses on understanding and implementing the concepts of deep learning and artificial intelligence.“Artificial Intelligence is the new electricity. Similar to how electricity revolutionized a number of industries about hundred years ago, Artificial Intelligence will transform and revolutionize different industries” - Andrew NGArtificial Intelligence and Deep Learning are becoming one of the most important components of modern day businesses. A large number of smart and intelligent systems are regularly built to solve the use cases that were earlier thought to be complex to solve. Some examples include automatic speech recognition in smartphones, conversation chatbots, image classification and clustering in search engines, natural language generation, and understanding.At a broader level, Artificial Intelligence and Deep Learning aim to create intelligent machines. However, at a much deeper level, it comprises of Mathematical relationships, sophisticated optimization algorithms, and models that generate intelligence in the form of predictions, segmentation, clustering, forecasting, classifications etc.Neural Networks are the building blocks of every Artificial Intelligence and Deep Learning models. In this artificial intelligence neural network tutorial, we will understand everything about the neural networks and the science behind them.Contents:Introduction of Neural NetworksSingle Processing Units - NeuronsActivation FunctionsForward PropagationBackward PropagationIntroduction of Artificial Neural Network terminologyA neural network is a mathematical model that is designed to function similar to biological neurons and nervous system. These models are used to recognize complex patterns and relationships that exist within a labeled data. A labeled dataset contains two types of variables - predictors (some features which are used as independent variables in the model) and target (some features which are treated as dependent variables of the model).  Employees data containing their information such as age, gender, experience, skills and labeled with their salary amount (an example of numerical data)Tweets labeled as positive, negative or neutral (an example of text data)Images of Animals labeled with Name of the Animal (an example of image data)Audios of Music labeled with the Genere of the Music (an example of audio data)The core structure of a Neural Network model is comprised of a large number of simple processing nodes which are interconnected and organized in different layers. An individual node in a layer is connected to several other nodes in the previous and the next layer. The inputs from one layer are received and processed to generate the output which is passed on to the next layer.The first layer of this architecture is often named as input layer which accepts the inputs, the last layer is named as the output layer which produces the output and every other layer between input and output layer is named as hidden layers. Let us now understand how to apply artificial neural network and how it works.Single Processing Units - NeuronA neuron is the smallest unit in a neural network. It accepts various inputs, applies a function and computes the output.Each incoming connection corresponds to different inputs to a node, To each of its connections, the node assigns a number known as a “weight”. The weight of every input variable signifies the importance or the priority of that variable among all other variables. For different values of input variables, the node multiplies the value with its associated weight and adds the resulting products together. It also adds another term called “bias” which helps the learning function to adjust to left or right. This summed number is then passed through an activation function (described in next section) which maps the inputs to the target output values.Let’s understand this through an example. Take a situation in which you want to purchase a new house and you will make the decision based on the following factors in order of priority.Cost of the PropertySquare Feet AreaConstruction YearAvailability of Security SystemsNearby AmenitiesClimate Factors in the LocalityCrime Rate in the LocalityThe best way to formalize the decision making based on this situation is to formulate a mathematical equation with:Every factor is represented as x1, x2, x3, …Every factor’s priority is represented as the weight: w1, w2, w3Node Input is represented as the weighted sum of factors and their weights (Z)Node Output is represented as the value of mapping function g (Z)Inputs: x1,x2, x3, …Weights: w1, w2, w3, …Node Bias term: “b”Node Input (Z) = w1*x1 + w2*x2 + w3*x3 + w4*x4 + … + w7*x7 + bNode Output (A) = g (Z)Here the function “g” is known as Activation Function. Let’s understand how this activation function works.Activation Functions - Applying Non-Linear TransformationsThe main goal of an activation function is to apply a non-linear transformation on the input to map it to the output. For example, a linear combination of 7 variables related to the house is mapped into two target output classes: “Buy the Property” and “Do not buy the Property”.The decision boundary of the output can be given by a threshold value. If the generated value is below a threshold value, the node outputs 0, otherwise 1. The generated outputs (0, 1) belongs to the decision. In our example, A is the generated output (Activation), and “b” is the threshold. If weighted sum of inputs and their weights comes out to be greater than “b” then you will buy the property otherwise not. i.e.If A > 0, then interpretation = “buy house”else A < 0, then interpretation =  “do not buy”WX + b > 0, 1(“buy house”)WX + b < 0, 0(“do not buy”)This is the basic equation of the activation function which is applied to each neuron. In the above example, we have applied Step Function as the activation function.There are other choices of nonlinear activation functions such as relu, sigmoid, tanh: Forward PropagationNeural Network model goes through the process called forward propagation in which it forward propagates the inputs and their activations through the layers to get the final output. The computation steps involved are:Z = W*X + bA = g(Z)g is the activation functionA is the predicted output using the input variables X.W is the weight matrixB is the bias matrixTo optimize the weights and bias matrices used in the neural network, the model computes the error in the output and makes small changes in the values. These changes are made so that the overall error can be reduced. This process of error computation and weights optimization is represented in functions called loss function and cost function.Error Computation: Loss Function and Cost FunctionLoss function measures the error in the prediction and the actual value. One simplistic loss function is the difference between actual value and the predicted value.Loss = Y - AY = Actual Value, A = Predicted ValueCost function computes the summation of loss function values for every training data example.Cost Function =  Summation (Loss Function)Minimizing the Cost Function - Backpropagation using Gradient DescentThe final step is to minimize the error term (cost function) and obtain the optimal values of weights and bias terms. A neural net model does this through a process called backpropagation using gradient descent.In this process, it computes the error of the final layer and back passes it to the previous layer. The previous layer associated weights and biases are adjusted to tackle the error. The values of weights and bias are updated using the process called gradient descent. In this algorithm, the derivative of error in the final layer is computed with respect to each weight. This error derivative is then used to find the derivative of weights and bias which are then subtracted from the original values to get the updated new values.Then, the model again forward propagate to compute the new error with new weights and then will backward propagate to update the weights again. The process is repeated several times to achieve the minimum error term. This process is also termed as training. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.RecapTo train a neural network model, following steps are implemented.Design the architecture of a neural network model with a number of layers, number of neurons in each layer, activation functions etc.For every training example in the input data, compute the activations using activation functions.Forward propagate the activations through the input layers to hidden layers to the output layerCompute the error of the final layer using Loss FunctionCompute the sum of errors for every training example in the input data using Cost FunctionBackpropagate the error to the previous layers and compute the derivative of error with respect to weights and bias parameters.Using Gradient Descent algorithm subtracts the weight and bias derivative terms from the original values.Perform this operation for a large number of iterations (epochs) to obtain a stable neural network model.In this article, we discussed the overall working of an AI neural network model. In the next article, we will discuss how to implement a neural network in Python. Feel free to share your comments.
Rated 4.0/5 based on 24 customer reviews
Normal Mode Dark Mode

Understanding The Essential Blocks of Artificial Intelligence - Neural Networks

Shivam Bansal
Blog
17th Apr, 2018
Understanding The Essential Blocks of Artificial Intelligence - Neural Networks

Artificial Intelligence, Machine Learning,  Deep Learning, and Neural Networks are all buzzwords right now. This article is the first among the series of planned articles which focuses on understanding and implementing the concepts of deep learning and artificial intelligence.

“Artificial Intelligence is the new electricity. Similar to how electricity revolutionized a number of industries about hundred years ago, Artificial Intelligence will transform and revolutionize different industries” - Andrew NG

Artificial Intelligence and Deep Learning are becoming one of the most important components of modern day businesses. A large number of smart and intelligent systems are regularly built to solve the use cases that were earlier thought to be complex to solve. Some examples include automatic speech recognition in smartphones, conversation chatbots, image classification and clustering in search engines, natural language generation, and understanding.

At a broader level, Artificial Intelligence and Deep Learning aim to create intelligent machines. However, at a much deeper level, it comprises of Mathematical relationships, sophisticated optimization algorithms, and models that generate intelligence in the form of predictions, segmentation, clustering, forecasting, classifications etc.

Neural Networks are the building blocks of every Artificial Intelligence and Deep Learning models. In this artificial intelligence neural network tutorial, we will understand everything about the neural networks and the science behind them.

Contents:

  1. Introduction of Neural Networks
  2. Single Processing Units - Neurons
  3. Activation Functions
  4. Forward Propagation
  5. Backward Propagation


Introduction of Artificial Neural Network terminology

A neural network is a mathematical model that is designed to function similar to biological neurons and nervous system. These models are used to recognize complex patterns and relationships that exist within a labeled data. A labeled dataset contains two types of variables - predictors (some features which are used as independent variables in the model) and target (some features which are treated as dependent variables of the model).  

  1. Employees data containing their information such as age, gender, experience, skills and labeled with their salary amount (an example of numerical data)
  2. Tweets labeled as positive, negative or neutral (an example of text data)
  3. Images of Animals labeled with Name of the Animal (an example of image data)
  4. Audios of Music labeled with the Genere of the Music (an example of audio data)

The core structure of a Neural Network model is comprised of a large number of simple processing nodes which are interconnected and organized in different layers. An individual node in a layer is connected to several other nodes in the previous and the next layer. The inputs from one layer are received and processed to generate the output which is passed on to the next layer.

Artificial Neural Network terminology
The first layer of this architecture is often named as input layer which accepts the inputs, the last layer is named as the output layer which produces the output and every other layer between input and output layer is named as hidden layers. Let us now understand how to apply artificial neural network and how it works.


Single Processing Units - Neuron

A neuron is the smallest unit in a neural network. It accepts various inputs, applies a function and computes the output.

Single Processing Units
Each incoming connection corresponds to different inputs to a node, To each of its connections, the node assigns a number known as a “weight”. The weight of every input variable signifies the importance or the priority of that variable among all other variables. For different values of input variables, the node multiplies the value with its associated weight and adds the resulting products together. It also adds another term called “bias” which helps the learning function to adjust to left or right. This summed number is then passed through an activation function (described in next section) which maps the inputs to the target output values.

Let’s understand this through an example. Take a situation in which you want to purchase a new house and you will make the decision based on the following factors in order of priority.

  • Cost of the Property
  • Square Feet Area
  • Construction Year
  • Availability of Security Systems
  • Nearby Amenities
  • Climate Factors in the Locality
  • Crime Rate in the Locality

The best way to formalize the decision making based on this situation is to formulate a mathematical equation with:

  • Every factor is represented as x1, x2, x3, …
  • Every factor’s priority is represented as the weight: w1, w2, w3
  • Node Input is represented as the weighted sum of factors and their weights (Z)
  • Node Output is represented as the value of mapping function g (Z)

Inputs: x1,x2, x3, …
Weights: w1, w2, w3, …
Node Bias term: “b”
Node Input (Z) = w1*x1 + w2*x2 + w3*x3 + w4*x4 + … + w7*x7 + b
Node Output (A) = g (Z)

Here the function “g” is known as Activation Function. Let’s understand how this activation function works.


Activation Functions - Applying Non-Linear Transformations

The main goal of an activation function is to apply a non-linear transformation on the input to map it to the output. For example, a linear combination of 7 variables related to the house is mapped into two target output classes: “Buy the Property” and “Do not buy the Property”.

The decision boundary of the output can be given by a threshold value. If the generated value is below a threshold value, the node outputs 0, otherwise 1. The generated outputs (0, 1) belongs to the decision. In our example, A is the generated output (Activation), and “b” is the threshold. If weighted sum of inputs and their weights comes out to be greater than “b” then you will buy the property otherwise not. i.e.

If A > 0, then interpretation = “buy house”
else A < 0, then interpretation =  “do not buy”

WX + b > 0, 1(“buy house”)
WX + b < 0, 0(“do not buy”)
This is the basic equation of the activation function which is applied to each neuron. In the above example, we have applied Step Function as the activation function.
Applying Non-Linear Transformations
There are other choices of nonlinear activation functions such as relu, sigmoid, tanh:
 
Applying Non-Linear Transformations


Forward Propagation

Neural Network model goes through the process called forward propagation in which it forward propagates the inputs and their activations through the layers to get the final output. The computation steps involved are:

Z = W*X + b
A = g(Z)

g is the activation function
A is the predicted output using the input variables X.
W is the weight matrix
B is the bias matrix

To optimize the weights and bias matrices used in the neural network, the model computes the error in the output and makes small changes in the values. These changes are made so that the overall error can be reduced. This process of error computation and weights optimization is represented in functions called loss function and cost function.


Error Computation: Loss Function and Cost Function

Loss function measures the error in the prediction and the actual value. One simplistic loss function is the difference between actual value and the predicted value.

Loss = Y - A
Y = Actual Value, A = Predicted Value

Cost function computes the summation of loss function values for every training data example.
Cost Function =  Summation (Loss Function)


Minimizing the Cost Function - Backpropagation using Gradient Descent

The final step is to minimize the error term (cost function) and obtain the optimal values of weights and bias terms. A neural net model does this through a process called backpropagation using gradient descent.

In this process, it computes the error of the final layer and back passes it to the previous layer. The previous layer associated weights and biases are adjusted to tackle the error. The values of weights and bias are updated using the process called gradient descent. In this algorithm, the derivative of error in the final layer is computed with respect to each weight. This error derivative is then used to find the derivative of weights and bias which are then subtracted from the original values to get the updated new values.

Then, the model again forward propagate to compute the new error with new weights and then will backward propagate to update the weights again. The process is repeated several times to achieve the minimum error term. This process is also termed as training. During training, the weights and thresholds are continually adjusted until training data with the same labels consistently yield similar outputs.


Recap

To train a neural network model, following steps are implemented.

  1. Design the architecture of a neural network model with a number of layers, number of neurons in each layer, activation functions etc.
  2. For every training example in the input data, compute the activations using activation functions.
  3. Forward propagate the activations through the input layers to hidden layers to the output layer
  4. Compute the error of the final layer using Loss Function
  5. Compute the sum of errors for every training example in the input data using Cost Function
  6. Backpropagate the error to the previous layers and compute the derivative of error with respect to weights and bias parameters.
  7. Using Gradient Descent algorithm subtracts the weight and bias derivative terms from the original values.
  8. Perform this operation for a large number of iterations (epochs) to obtain a stable neural network model.

In this article, we discussed the overall working of an AI neural network model. In the next article, we will discuss how to implement a neural network in Python. Feel free to share your comments.

Shivam

Shivam Bansal

Blog author

Shivam Bansal is a Data Scientist with core experience in building automated solutions using natural language processing, machine learning problems, data visualisations and noSQL databases for different industries such as healthcare, social media, insurance and e-commerce

Leave a Reply

Your email address will not be published. Required fields are marked *

Top comments

Nilesh

11 May 2018 at 12:47pm
Nice Article for a beginner like me.

SUBSCRIBE OUR BLOG

Follow Us On

Share on

other Blogs

20% Discount