top

v in 15 Minutes

In the last article, I discussed the fundamental concepts of deep learning and artificial intelligence - Neural Networks. In this article, I will discuss about how to implement a neural network to classify Cats and Non-Cat images in python. Before implementing a Neural Network model in python, it is important to understand the working and implementation of the underlying classification model called Logistic Regression model. Logistic Regression uses a logit function to classify a set of data into multiple categories. The logit function (or sigmoid function) is defined as 1/ (1 + exp(-Z)) where z is the input vector. The output value of a logit function always takes values between 0 and 1 as represented in the following graph:Let us take a dataset of labelled images which contains two types of labels - Cat and Non-Cat images. We need to build a classifier that predicts if a given image is a Cat Image or a Non-Cat. Let us first prepare our dataset that will be used for classification. In this step, we will first load the dataset in x (predictor) and y (target) variables. We will then perform preprocessing to normalize the data. Since the given data is of image type, the preprocessing step will include flattening the entire pixel values and dividing by maximum pixel value. To implement a Logistic Regression classification model, three important steps are requiredCompute the sigmoid activation of inputsCompute the error term (cost function)Optimize the model weights using a cost function 1. Create a function to compute the sigmoid activation of the input data. 2. Create a function to compute the Cost Function and obtain the weights derivativesTo compute the cost function, we first need to compute the loss function. The loss function is the measure of error in the prediction value (activation) computed using sigmoid activation. The simplest loss function can be defined as the mean of the square of the difference in predicted value and the actual value. If A is the activation (prediction), and Y is the actual value, loss function can be defined as:Loss = (1/m) * (A - Y )^2We will discuss in the next step that we need to find the global minima of this function using optimization algorithms. One problem with this loss function is that the optimization problem becomes non convex. This means that there exist multiple local minimas in the cost function and the gradient descent may not converge to the global minimum value. A better loss function which overcomes this problem is the log loss function which is defined as:Loss = -Y * Log (A) - (1-Y) * Log (1 - A)The cost function is defined as the sum of loss function values for every input in the training data.3. Create a function to optimize the model weights (train the mode)Finally, to train the model, an optimization algorithm is used which minimises the cost function and updates the model weights with the optimized values. An optimization algorithm such as gradient descent is used for this purpose.Gradient Descent minimises the Cost function value by making small adjustments in the model weights. It iterates over different inputs in the training examples and computes the derivatives (small change) of model weights so that the cost is minimised. These derivatives of model weights are then subtracted from original model weights to get the updated values. Great, now our core functions are implemented. Lets compile the entire code together to get the most optimized values of model weightsLet's define a predict function which can be used to make predictions on test dataNow, it's time to implement a neural network model on the same lines. The architecture of the neural network is: The first layer is called the input layer which consists of inputs. The next layer consists of hidden layer which comprises of different neurons. Every neuron computes the activation functions values using the tanh activation function. The final layer is the output layer which computes the sigmoid activation of the received input from the hidden layer. The Steps to implement Neural Network are as follows:1. Define the neural network structure ( # of input units,  # of hidden units, etc)2. Initialize the model's parameters3. For a number of epochs:    - Implement forward propagation    - Compute loss    - Implement backward propagation to get the gradients    - Update parameters (gradient descent)1. Define the neural network architectureIn the first step, we define the architecture of neural network which consists of defining the number of nodes in the input layer, the output layer, and the hidden layer.2. Initialize the parameters Next, we need to implement a function that initialize the weights and bias of the models to random values. It is important to initialize the weights to the small random values to break the symmetry among different neurons. The meaning of symmetry among the neurons means if the weights values are all initialized to zeros, then every neuron is likely to behave similarly. Thus, the neural network may not compute complex non linear activation function values, but it will merely be a logistic regression implementation. 3.1 Implement Forward Propagation : The next step is to compute the forward propagation activations. We will use tanh in the hidden layer and sigmoid function in the output layer. 3.2 Compute the cost Now, we need to compute the cost (error term) which can be computed using the log loss function that we discussed in the logistic regression article. To recap, a cost function is the sum of loss function values for every input in the training example. Loss = -Y * Log A - (1-Y) * log (1-A)3.3 Compute Back PropagationNow, the cost term will be used to obtain the derivative of model weights (small change in model weight to compensate the error term). The derivative values will be used to update the original values of model weights during the optimization (training) process.3.4 Update the parametersIn the final step, we need to run an optimization algorithm in which a number of iterations are performed. In every iteration, the cost function and the derivatives of weights and bias with respect to cost function are computed. These derivatives are used to update the original values of weights and bias. The process is repeated for many iterations with the aim to minimise the cost function value.The essential components of cost function are implemented. Lets compile all of the above to implement a neural networkGenerate a synthetic dataset using the following function and train the neural network model Implement a function that predicts the values and compute the accuracy Compute the accuracy on test setAnd we have trained a neural network model which is performing 90% accurate on the test dataset. Feel free to share your views, thoughts, and queries in the comment section. 
Rated 4.5/5 based on 10 customer reviews
Normal Mode Dark Mode

v in 15 Minutes

Shivam Bansal
Blog
06th Aug, 2018
v in 15 Minutes

In the last article, I discussed the fundamental concepts of deep learning and artificial intelligence - Neural Networks. In this article, I will discuss about how to implement a neural network to classify Cats and Non-Cat images in python. 

Before implementing a Neural Network model in python, it is important to understand the working and implementation of the underlying classification model called Logistic Regression model. Logistic Regression uses a logit function to classify a set of data into multiple categories. The logit function (or sigmoid function) is defined as 1/ (1 + exp(-Z)) where z is the input vector. The output value of a logit function always takes values between 0 and 1 as represented in the following graph:

Implementing a Neural Network with Python in 15 Minutes


Let us take a dataset of labelled images which contains two types of labels - Cat and Non-Cat images. We need to build a classifier that predicts if a given image is a Cat Image or a Non-Cat. 

Let us first prepare our dataset that will be used for classification. In this step, we will first load the dataset in x (predictor) and y (target) variables. We will then perform preprocessing to normalize the data. Since the given data is of image type, the preprocessing step will include flattening the entire pixel values and dividing by maximum pixel value. 


To implement a Logistic Regression classification model, three important steps are required

  1. Compute the sigmoid activation of inputs

  2. Compute the error term (cost function)

  3. Optimize the model weights using a cost function 

1. Create a function to compute the sigmoid activation of the input data. Create a function to compute the sigmoid activation of the input data.


2. Create a function to compute the Cost Function and obtain the weights derivatives

To compute the cost function, we first need to compute the loss function. The loss function is the measure of error in the prediction value (activation) computed using sigmoid activation. The simplest loss function can be defined as the mean of the square of the difference in predicted value and the actual value. If A is the activation (prediction), and Y is the actual value, loss function can be defined as:

Loss = (1/m) * (A - Y )^2

We will discuss in the next step that we need to find the global minima of this function using optimization algorithms. One problem with this loss function is that the optimization problem becomes non convex. This means that there exist multiple local minimas in the cost function and the gradient descent may not converge to the global minimum value. A better loss function which overcomes this problem is the log loss function which is defined as:

Loss = -Y * Log (A) - (1-Y) * Log (1 - A)

The cost function is defined as the sum of loss function values for every input in the training data.


Create a function to compute the Cost Function and obtain the weights derivatives


3. Create a function to optimize the model weights (train the mode)

Finally, to train the model, an optimization algorithm is used which minimises the cost function and updates the model weights with the optimized values. An optimization algorithm such as gradient descent is used for this purpose.


Create a function to optimize the model weights (train the mode)


Gradient Descent minimises the Cost function value by making small adjustments in the model weights. It iterates over different inputs in the training examples and computes the derivatives (small change) of model weights so that the cost is minimised. These derivatives of model weights are then subtracted from original model weights to get the updated values. 

Great, now our core functions are implemented. Lets compile the entire code together to get the most optimized values of model weights



Let's define a predict function which can be used to make predictions on test data




Now, it's time to implement a neural network model on the same lines. The architecture of the neural network is: 


layers


The first layer is called the input layer which consists of inputs. The next layer consists of hidden layer which comprises of different neurons. Every neuron computes the activation functions values using the tanh activation function. The final layer is the output layer which computes the sigmoid activation of the received input from the hidden layer. The Steps to implement Neural Network are as follows:

1. Define the neural network structure ( # of input units,  # of hidden units, etc)
2. Initialize the model's parameters
3. For a number of epochs:
    - Implement forward propagation
    - Compute loss
    - Implement backward propagation to get the gradients
    - Update parameters (gradient descent)

1. Define the neural network architecture

In the first step, we define the architecture of neural network which consists of defining the number of nodes in the input layer, the output layer, and the hidden layer.

Define the neural network architecture


2. Initialize the parameters 

Next, we need to implement a function that initialize the weights and bias of the models to random values. It is important to initialize the weights to the small random values to break the symmetry among different neurons. The meaning of symmetry among the neurons means if the weights values are all initialized to zeros, then every neuron is likely to behave similarly. Thus, the neural network may not compute complex non linear activation function values, but it will merely be a logistic regression implementation. 


Initialize the parameters


3.1 Implement Forward Propagation : 

The next step is to compute the forward propagation activations. We will use tanh in the hidden layer and sigmoid function in the output layer. 

Implement Forward Propagation :



3.2 Compute the cost 

Now, we need to compute the cost (error term) which can be computed using the log loss function that we discussed in the logistic regression article. To recap, a cost function is the sum of loss function values for every input in the training example. 

Loss = -Y * Log A - (1-Y) * log (1-A)


 Compute the cost


3.3 Compute Back Propagation

Now, the cost term will be used to obtain the derivative of model weights (small change in model weight to compensate the error term). The derivative values will be used to update the original values of model weights during the optimization (training) process.


 Compute Back Propagation


3.4 Update the parameters

In the final step, we need to run an optimization algorithm in which a number of iterations are performed. In every iteration, the cost function and the derivatives of weights and bias with respect to cost function are computed. These derivatives are used to update the original values of weights and bias. The process is repeated for many iterations with the aim to minimise the cost function value.


Update the parameters


The essential components of cost function are implemented. Lets compile all of the above to implement a neural network



Generate a synthetic dataset using the following function and train the neural network model 



Implement a function that predicts the values and compute the accuracy 


Compute the accuracy on test set


And we have trained a neural network model which is performing 90% accurate on the test dataset. Feel free to share your views, thoughts, and queries in the comment section. 

Shivam

Shivam Bansal

Blog author

Shivam Bansal is a Data Scientist with core experience in building automated solutions using natural language processing, machine learning problems, data visualisations and noSQL databases for different industries such as healthcare, social media, insurance and e-commerce

Leave a Reply

Your email address will not be published. Required fields are marked *

Top comments

Mary Smith

20 November 2018 at 4:56pm
Nice post. I used to be checking continuously this weblog and I'm inspired

SUBSCRIBE OUR BLOG

Follow Us On

Share on

other Blogs

20% Discount