Skip to main content

Exploring CNN with TensorFlow & Keras


Convolutional Neural Network or CNN for short, is one of the widely used neural network architecture for image recognition. It’s use cases can be widely extended to various powerful tasks, such as, object detection within an image, image classification, facial recognition, gesture recognition etc. Indeed, Convolutional Neural Networks (CNNs) are designed with some level of resemblance to the image recognition process in the human brain. For instance, In the visual cortex, neurons have local receptive fields, meaning they respond to stimuli only in a specific region of the visual field, which is achieved by CNN using kernels or filters. Both human brain and CNN process the visual information in hierarchical manner. Basic information of an image is extracted via lower level of neurons in human brain, and higher-level neurons integrate the information from lower-level neurons to identify the complex patterns. On the other hand, in CNN, we use multiple convolutional layers to extract hierarchical features from the input. 

Going forward, there are many frameworks and libraries designed to develop neural networks applications, and can be used for CNN also. One of the widely used frameworks is TensorFlow, which was developed by Google, and released in 2015. Let’s demonstrate TensorFlow with one of the sample and simple CNN use cases. Consider a model which would distinguish between the images of animals and buildings. Classifying images of animals and buildings would be relatively simpler than classifying the different breeds of dogs. So, let’s take that example for a better understanding of CNN.


Preparation of dataset

I have collected few images of animals and building and organized those at a certain location in my laptop. Better to keep images in folders labelled with class names, like, animal images in animal folder and building images in building folder. Let’s read those images and prepare our training and testing data. Image arrays will be stored in X and their corresponding label or class will be stored in Y.


Now, we would be needing to split our data for training and validation purposes. Best practice is to split the data into 80/20 ratio, where 80% of the images will be used for training and 20% of those will be utilized for validation of the model (validation data is the unseen data to the model). Notice that I have normalized the images after dividing those by 255. Normalization is one of the standard processes for image processing.


Another optional step is image augmentation, which is used to produce the diversified version of available images. It is a very useful technique when you have a small amount of data, and you can increase it using image augmentation. More information about image augmentation can be read from our page, Data Pre-processing with Datasets and Data Loaders. It has been explained on this page using PyTorch, but you can understand the concept and correlate it with TensorFlow.



Compiling and training of the model

It’s time to create a CNN model using various layering components in the library. For CNN models, we mainly use convolutional layers, pooling layers, flattening and dense layers. Each layer has its specific function. You would notice that I have opted for a couple of sets of convolutional and max pooling layers, along with a set of flatten and dense layers at the output. Conv2D parameters signifies the number of filters or kernels to be used, size of kernels, activation function (in this case ReLU), input shape of data (in this case 224x224x3), number of strides in both the dimension (height & width).



Compiling the model

Furthermore, compile function in tensorflow-keras library has been used to configure the learning process of the model. This function specifies the optimizer, loss function, and evaluation metrics for the model. optimizer and loss functions are extremely crucial part of deep learning architecture. using the optimal values, our goal is to decrease the loss at each iteration or epoch. One of the prominent optimizing algorithms in deep learning architecture is ‘Gradient Descent’. 


Training the model

Finally, it’s time to run the training. “fit” function in tensoflow-keras library has been used to run the training of the model. Parameters to this method are training data (in this case, train_x and train_y), epochs (no of iterations the model will go through the training data during the learning process), validation data (in this case, test_X and test_Y ), and callback objects (callback objects to perform certain actions at various points during the training process, such as, early stopping, TensorBoard logging to send metrics to TensorBoard for visualization and troubleshooting).

Below is the output of the program. It ended up the model with 80% accuracy on test data. We can save this model to any location, and case use it further for classifying the images of animals and buildings. It looks like, the graphs are little zigzag, those can be improved by incorporating more training data. Ideally, the accuracy should be increasing continuously, and loss should be decreasing in the same fashion. That’s it.!!





Comments

Popular posts from this blog

Finetuning of Transformers in Natural Language Processing

Transformers are the essential parts of deep neural network, and widely used in Natural language processing tasks. We have a wide variety of usages where transformers are used in real time scenarios, such as, translations, text generation, question answering and various other NLP tasks. One of the widely used examples of transformer is Chat GPT. More information about transformer architecture and its mechanism can be accessed on page Understanding Transformers (BERT & GPT) . One of the very important processes in transformers is Finetuning. Finetuning is the way for adapting the OOB (out of the box) model for your specific tasks. In other words, it is the process of training a pre-trained model on your specific datasets to adapt the knowledge from new dataset. During fine-tuning, the parameters of the pre-trained model are adjusted based on the task-specific dataset. The goal is to adapt the model’s knowledge to perform well on the particular task of interest. Let’s understand how ...

Tuning Hyperparameters and visualizing on TensorBoard

Hyperparameters tuning is one of the most crucial steps of machine or deep learning process. Hyperparameters are configurations for a machine learning model that are not learned from the data but are set before the training process begins. These parameters are essential for controlling the overall behavior of the model. While training a machine learning model, you may have to experiment with different hyperparameters such as learning rate, batch size, dropout size, optimizers etc. in order to achieve the model with best accuracy. Performing experiments with hyperparameters one by one can be a tedious and time-consuming process. For instance, you initiate the training process with a specific combination of hyperparameters, and subsequently, you repeat the procedure with a different set of hyperparameters, and so forth. TensorFlow allows you to run experiments with different sets of hyperparameters in a single execution, enabling you to visualize the metrics on HParam dashboard in Tenso...