In the last article, I discussed the fundamental concepts of deep learning and artificial intelligence - Neural Networks. In this article, I will discuss about how to implement a neural network to classify Cats and Non-Cat images in python.
Before implementing a Neural Network model in python, it is important to understand the working and implementation of the underlying classification model called Logistic Regression model. Logistic Regression uses a logit function to classify a set of data into multiple categories. The logit function (or sigmoid function) is defined as 1/ (1 + exp(-Z)) where z is the input vector. The output value of a logit function always takes values between 0 and 1 as represented in the following graph:
Let us take a dataset of labelled images which contains two types of labels - Cat and Non-Cat images. We need to build a classifier that predicts if a given image is a Cat Image or a Non-Cat.
Let us first prepare our dataset that will be used for classification. In this step, we will first load the dataset in x (predictor) and y (target) variables. We will then perform preprocessing to normalize the data. Since the given data is of image type, the preprocessing step will include flattening the entire pixel values and dividing by maximum pixel value.
To implement a Logistic Regression classification model, three important steps are required
Compute the sigmoid activation of inputs
Compute the error term (cost function)
Optimize the model weights using a cost function
1. Create a function to compute the sigmoid activation of the input data.
2. Create a function to compute the Cost Function and obtain the weights derivatives
To compute the cost function, we first need to compute the loss function. The loss function is the measure of error in the prediction value (activation) computed using sigmoid activation. The simplest loss function can be defined as the mean of the square of the difference in predicted value and the actual value. If A is the activation (prediction), and Y is the actual value, loss function can be defined as:
Loss = (1/m) * (A - Y )^2
We will discuss in the next step that we need to find the global minima of this function using optimization algorithms. One problem with this loss function is that the optimization problem becomes non convex. This means that there exist multiple local minimas in the cost function and the gradient descent may not converge to the global minimum value. A better loss function which overcomes this problem is the log loss function which is defined as:
Loss = -Y * Log (A) - (1-Y) * Log (1 - A)
The cost function is defined as the sum of loss function values for every input in the training data.
3. Create a function to optimize the model weights (train the mode)
Finally, to train the model, an optimization algorithm is used which minimises the cost function and updates the model weights with the optimized values. An optimization algorithm such as gradient descent is used for this purpose.
Gradient Descent minimises the Cost function value by making small adjustments in the model weights. It iterates over different inputs in the training examples and computes the derivatives (small change) of model weights so that the cost is minimised. These derivatives of model weights are then subtracted from original model weights to get the updated values.
Great, now our core functions are implemented. Lets compile the entire code together to get the most optimized values of model weights
Let's define a predict function which can be used to make predictions on test data
Now, it's time to implement a neural network model on the same lines. The architecture of the neural network is:
The first layer is called the input layer which consists of inputs. The next layer consists of hidden layer which comprises of different neurons. Every neuron computes the activation functions values using the tanh activation function. The final layer is the output layer which computes the sigmoid activation of the received input from the hidden layer. The Steps to implement Neural Network are as follows:
1. Define the neural network structure ( # of input units, # of hidden units, etc)
2. Initialize the model's parameters
3. For a number of epochs:
- Implement forward propagation
- Compute loss
- Implement backward propagation to get the gradients
- Update parameters (gradient descent)
1. Define the neural network architecture
In the first step, we define the architecture of neural network which consists of defining the number of nodes in the input layer, the output layer, and the hidden layer.
2. Initialize the parameters
Next, we need to implement a function that initialize the weights and bias of the models to random values. It is important to initialize the weights to the small random values to break the symmetry among different neurons. The meaning of symmetry among the neurons means if the weights values are all initialized to zeros, then every neuron is likely to behave similarly. Thus, the neural network may not compute complex non linear activation function values, but it will merely be a logistic regression implementation.
3.1 Implement Forward Propagation :
The next step is to compute the forward propagation activations. We will use tanh in the hidden layer and sigmoid function in the output layer.
3.2 Compute the cost
Now, we need to compute the cost (error term) which can be computed using the log loss function that we discussed in the logistic regression article. To recap, a cost function is the sum of loss function values for every input in the training example.
Loss = -Y * Log A - (1-Y) * log (1-A)
3.3 Compute Back Propagation
Now, the cost term will be used to obtain the derivative of model weights (small change in model weight to compensate the error term). The derivative values will be used to update the original values of model weights during the optimization (training) process.
3.4 Update the parameters
In the final step, we need to run an optimization algorithm in which a number of iterations are performed. In every iteration, the cost function and the derivatives of weights and bias with respect to cost function are computed. These derivatives are used to update the original values of weights and bias. The process is repeated for many iterations with the aim to minimise the cost function value.
The essential components of cost function are implemented. Lets compile all of the above to implement a neural network
Generate a synthetic dataset using the following function and train the neural network model
Implement a function that predicts the values and compute the accuracy
Compute the accuracy on test set