Skip to main content

RNN model training with PyTorch

A Recurrent Neural Network (RNN) is a type of artificial neural network designed for processing sequential data like time series, speech, and text. Unlike traditional neural networks, RNNs have memory that allows them to retain information from previous inputs, making them effective for tasks where context matters. 

RNNs played a crucial role for modern AI applications, especially in Natural Language Processing (NLP) and speech processing.

Below is a proof of concept demonstrating the training of an RNN model for sentiment analysis using the publicly available IMDB dataset.

Process:

  • Install torchtext 0.6.0


  • Importing the PyTorch libraries

# Styrish AI

# Installing PyTorch libraries

import torch
#=================================================================
# this will check if GPU is available, the device on which the
# code is run is GPU, else CPU.

if torch.cuda.is_available():
    device = torch.device("cuda")
    print(f"GPU is available: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")
#=================================================================
import torch.nn as nn
import torch.optim as optim
from torchtext import datasets, data
from torchtext.data import Field, TabularDataset, BucketIterator
from torchtext.data.utils import get_tokenizer
from torch.utils.data import DataLoader, Subset
import random

  • Defining tokenizer and dictionary fields.
# Author Styrish AI

# Define Fields for preprocessing

# tokenizer is a PyTorch OOB library, which will tokenize the
# processed data, so that it can be processed. For example,
# 'This is a sentence', will be converted to
# ['This', 'is', 'a', 'sentence']
tokenizer = get_tokenizer('spacy', language='en_core_web_sm')

TEXT = data.Field(tokenize=tokenizer, lower=True)
LABEL = data.LabelField(dtype=torch.long)

# Define fields dictionary for loading the data
fields = [('text', TEXT), ('label', LABEL)]


  • Split the IMDB dataset to train and test data.
# Styrish AI

# Splitting the Pytorch prebuilt IMDB data into train and test
# train_data will be used for training and,
# test_data will be used for tetsing.

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)


  • Constructing the dataloaders
# Author : Styrish AI

# Constructing data loader for training and testing data.
# datasets are constructed to dataloaders in pytorch
# before process for training and testing (See CNN page).

# We will use Pytorch OOB dataloader, i.e. BucketIterator

print(len(train_data))
print(len(test_data))

TEXT.build_vocab(train_data, max_size=20000)
LABEL.build_vocab(test_data)
# for example in train_data.examples:
#     text = example.text
#     #print('train text', text)
# TEXT.build_vocab(train_data, max_size=20)
# LABEL.build_vocab(train_data)

# #TEXT.build_vocab(test_data, max_size=20)
# #LABEL.build_vocab(test_data)

# for example in test_data.examples:
#     text = example.text
#     #print('test text', text)

train_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, test_data),
    batch_size=100,
    sort_key=lambda x: len(x.text),
    sort_within_batch=False,
    shuffle=True  # Shuffle the data for the training set
)




  • Creating a model class.
# Author : Styrish AI

# Creating the model class. This will use RNN specific layers provided by
# PyTorch nn module. LSTM and Embedding layers are used in RNN model, whereas
# in CNN we used CNN specific layers like Conv2D, Maxpool2D etc.

# RNN model for sentiment analysis
class SentimentRNN(nn.Module):
    def __init__(self, input_size, embedding_dim, hidden_size, output_size, num_layers=1):
        super(SentimentRNN, self).__init__()
        self.embedding = nn.Embedding(input_size, embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_size)
        self.fc = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
         # Apply embedding to the input sequence
        #print('1')
        embedded = self.embedding(x)

        # LSTM layer
        #print('2')
        output, (hidden, cell) = self.rnn(embedded)
        #print('3')
        hidden.squeeze_(0)
        #print('4', hidden)
        #hidden = hidden[-1]
        #print('4/1', hidden)

        # Fully connected layer
        output = self.fc(hidden)
        #print('5')

        #output layer
        #output = self.output_layer(output)
        #print('6')
        output = self.sigmoid(output)  # Apply sigmoid activation function
        #print('7',output)
        return output


  • Initiate the model and other functions.
# Author : Styrish AI

# Instantiate model, loss function, and optimizer.
# See CNN page for details on thse components.

model = SentimentRNN(input_size, embedding_dim, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
#print(model)
model.to(device)



  • Training the model.
# Author : Styrish AI

# Running the training loop for 15 epochs.
# See CNN page/ other blog info for details on training loop.

num_epochs = 15
cost_list=[]
for epoch in range(num_epochs):
    model.train()
    cost=0
    for batch in train_iterator:
        inputs = batch.text
        labels = batch.label
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        predictions = model(inputs)
        loss = criterion(predictions, labels)
        loss.backward()
        optimizer.step()
        cost+=loss.item()

    cost_list.append(cost)
    #print("Cost List:", cost_list)

# Test the trained model on a sample text
# Validation loop
    accuracy_list=[]
    N_test=len(test_iterator)
    model.eval()
    with torch.no_grad():
        val_loss = 0.0
        correct = 0
        total = 0
        for batch in test_iterator:
            #print('In test loop')
            inputs = batch.text
            labels = batch.label
            inputs, labels = inputs.to(device), labels.to(device)
          outputs = model(inputs)
            #_, predicted_class = torch.max(outputs, 1)
            #print('outputs', predicted_class)
            val_loss += criterion(outputs, labels)
            _, predicted = torch.max(outputs.data, 1)
            #predicted = (outputs > 0.5).float()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            accuracy=correct/N_test
            accuracy_list.append(accuracy)

    avg_val_loss = val_loss / len(test_iterator)
    accuracy = correct / total
    #print("Accuracy List:", accuracy_list)
    print(f'Epoch {epoch+1}/{num_epochs},
Loss: {loss.item():.4f}, Val Loss: {avg_val_loss:.4f},
Accuracy: {accuracy:.2%}')


If you see the below image, accuracy is going high on each epoch. (slight fluctuation, but that is Ok).




  • Plotting accuracy vs. loss graph.
# Author: Styrish AI

# Finally, Plot the accuracy vs Loss graph to show
# the model's performance. Ideally, Loss line should be going downwards
# and accuracy line should be going upwards in order to design a well
# trained model.

import matplotlib.pyplot as plt
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.plot(cost_list,color=color)
ax1.set_xlabel('epoch',color=color)
ax1.set_ylabel('total loss',color=color)
ax1.tick_params(axis='y', color=color)

ax2 = ax1.twinx()
color = 'tab:blue'
ax2.set_ylabel('accuracy', color=color)
ax2.plot( accuracy_list, color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout()






That's It !


  • Saving the model.
# Author: Styrish AI

# Once satisfied, save the model for future use. I have save the trained model
# to Google drive.

trained_model_path = '/content/drive/MyDrive/deep-learning/models/rnn/sentiment-analysis/model.pth'
torch.save(model.state_dict(), trained_model_path)


































Comments

Popular posts from this blog

Exploring CNN with TensorFlow & Keras

Convolutional Neural Network or CNN for short, is one of the widely used neural network architecture for image recognition. It’s use cases can be widely extended to various powerful tasks, such as, object detection within an image, image classification, facial recognition, gesture recognition etc. Indeed, Convolutional Neural Networks (CNNs) are designed with some level of resemblance to the image recognition process in the human brain. For instance, In the visual cortex, neurons have local receptive fields, meaning they respond to stimuli only in a specific region of the visual field, which is achieved by CNN using kernels or filters. Both human brain and CNN process the visual information in hierarchical manner. Basic information of an image is extracted via lower level of neurons in human brain, and higher-level neurons integrate the information from lower-level neurons to identify the complex patterns. On the other hand, in CNN, we use multiple convolutional layers to extract hiera...

Finetuning of Transformers in Natural Language Processing

Transformers are the essential parts of deep neural network, and widely used in Natural language processing tasks. We have a wide variety of usages where transformers are used in real time scenarios, such as, translations, text generation, question answering and various other NLP tasks. One of the widely used examples of transformer is Chat GPT. More information about transformer architecture and its mechanism can be accessed on page Understanding Transformers (BERT & GPT) . One of the very important processes in transformers is Finetuning. Finetuning is the way for adapting the OOB (out of the box) model for your specific tasks. In other words, it is the process of training a pre-trained model on your specific datasets to adapt the knowledge from new dataset. During fine-tuning, the parameters of the pre-trained model are adjusted based on the task-specific dataset. The goal is to adapt the model’s knowledge to perform well on the particular task of interest. Let’s understand how ...

Tuning Hyperparameters and visualizing on TensorBoard

Hyperparameters tuning is one of the most crucial steps of machine or deep learning process. Hyperparameters are configurations for a machine learning model that are not learned from the data but are set before the training process begins. These parameters are essential for controlling the overall behavior of the model. While training a machine learning model, you may have to experiment with different hyperparameters such as learning rate, batch size, dropout size, optimizers etc. in order to achieve the model with best accuracy. Performing experiments with hyperparameters one by one can be a tedious and time-consuming process. For instance, you initiate the training process with a specific combination of hyperparameters, and subsequently, you repeat the procedure with a different set of hyperparameters, and so forth. TensorFlow allows you to run experiments with different sets of hyperparameters in a single execution, enabling you to visualize the metrics on HParam dashboard in Tenso...