RNN model training with PyTorch

A Recurrent Neural Network (RNN) is a type of artificial neural network designed for processing sequential data like time series, speech, and text. Unlike traditional neural networks, RNNs have memory that allows them to retain information from previous inputs, making them effective for tasks where context matters.

RNNs played a crucial role for modern AI applications, especially in Natural Language Processing (NLP) and speech processing.

Below is a proof of concept demonstrating the training of an RNN model for sentiment analysis using the publicly available IMDB dataset.

Process:

Install torchtext 0.6.0

Importing the PyTorch libraries

# Styrish AI

# Installing PyTorch libraries

import torch
#=================================================================
# this will check if GPU is available, the device on which the  
# code is run is GPU, else CPU.

if torch.cuda.is_available(): 
    device = torch.device("cuda")
    print(f"GPU is available: {torch.cuda.get_device_name(0)}")
else:
    device = torch.device("cpu")
    print("GPU is not available, using CPU.")
#=================================================================
import torch.nn as nn
import torch.optim as optim
from torchtext import datasets, data
from torchtext.data import Field, TabularDataset, BucketIterator
from torchtext.data.utils import get_tokenizer
from torch.utils.data import DataLoader, Subset
import random

Defining tokenizer and dictionary fields.

# Author Styrish AI

# Define Fields for preprocessing

# tokenizer is a PyTorch OOB library, which will tokenize the 
# processed data, so that it can be processed. For example,
# 'This is a sentence', will be converted to 
# ['This', 'is', 'a', 'sentence']
tokenizer = get_tokenizer('spacy', language='en_core_web_sm')

TEXT = data.Field(tokenize=tokenizer, lower=True)
LABEL = data.LabelField(dtype=torch.long)

# Define fields dictionary for loading the data
fields = [('text', TEXT), ('label', LABEL)]

Split the IMDB dataset to train and test data.

# Styrish AI

# Splitting the Pytorch prebuilt IMDB data into train and test
# train_data will be used for training and,
# test_data will be used for tetsing.

train_data, test_data = datasets.IMDB.splits(TEXT, LABEL)

Constructing the dataloaders

# Author : Styrish AI

# Constructing data loader for training and testing data.
# datasets are constructed to dataloaders in pytorch 
# before process for training and testing (See CNN page).

# We will use Pytorch OOB dataloader, i.e. BucketIterator

print(len(train_data))
print(len(test_data))

TEXT.build_vocab(train_data, max_size=20000)
LABEL.build_vocab(test_data)
# for example in train_data.examples:
#     text = example.text
#     #print('train text', text)
# TEXT.build_vocab(train_data, max_size=20)
# LABEL.build_vocab(train_data)

# #TEXT.build_vocab(test_data, max_size=20)
# #LABEL.build_vocab(test_data)

# for example in test_data.examples:
#     text = example.text
#     #print('test text', text)

train_iterator, test_iterator = data.BucketIterator.splits(
    (train_data, test_data),
    batch_size=100,
    sort_key=lambda x: len(x.text),
    sort_within_batch=False,
    shuffle=True  # Shuffle the data for the training set
)

Creating a model class.

# Author : Styrish AI

# Creating the model class. This will use RNN specific layers provided by
# PyTorch nn module. LSTM and Embedding layers are used in RNN model, whereas
# in CNN we used CNN specific layers like Conv2D, Maxpool2D etc.

# RNN model for sentiment analysis
class SentimentRNN(nn.Module):
    def __init__(self, input_size, embedding_dim, hidden_size, output_size, num_layers=1):
        super(SentimentRNN, self).__init__()
        self.embedding = nn.Embedding(input_size, embedding_dim)
        self.rnn = nn.LSTM(embedding_dim, hidden_size)
        self.fc = nn.Linear(hidden_size, output_size)
        self.sigmoid = nn.Sigmoid()

    def forward(self, x):
         # Apply embedding to the input sequence
        #print('1')
        embedded = self.embedding(x)

        # LSTM layer
        #print('2')
        output, (hidden, cell) = self.rnn(embedded)
        #print('3')
        hidden.squeeze_(0)
        #print('4', hidden)
        #hidden = hidden[-1]
        #print('4/1', hidden)

        # Fully connected layer
        output = self.fc(hidden)
        #print('5')

        #output layer
        #output = self.output_layer(output)
        #print('6')
        output = self.sigmoid(output)  # Apply sigmoid activation function
        #print('7',output)
        return output

Initiate the model and other functions.

# Author : Styrish AI

# Instantiate model, loss function, and optimizer.
# See CNN page for details on thse components.

model = SentimentRNN(input_size, embedding_dim, hidden_size, output_size)
criterion = nn.CrossEntropyLoss()
optimizer = optim.Adam(model.parameters(), lr=0.001)
#print(model)
model.to(device)

Training the model.

# Author : Styrish AI

# Running the training loop for 15 epochs.
# See CNN page/ other blog info for details on training loop.

num_epochs = 15
cost_list=[]
for epoch in range(num_epochs):
    model.train()
    cost=0
    for batch in train_iterator:
        inputs = batch.text
        labels = batch.label
        inputs, labels = inputs.to(device), labels.to(device)
        optimizer.zero_grad()
        predictions = model(inputs)
        loss = criterion(predictions, labels)
        loss.backward()
        optimizer.step()
        cost+=loss.item()

    cost_list.append(cost)
    #print("Cost List:", cost_list)

# Test the trained model on a sample text
# Validation loop
    accuracy_list=[]
    N_test=len(test_iterator)
    model.eval()
    with torch.no_grad():
        val_loss = 0.0
        correct = 0
        total = 0
        for batch in test_iterator:
            #print('In test loop')
            inputs = batch.text
            labels = batch.label
            inputs, labels = inputs.to(device), labels.to(device)
            outputs = model(inputs)
            #_, predicted_class = torch.max(outputs, 1)
            #print('outputs', predicted_class)
            val_loss += criterion(outputs, labels)
            _, predicted = torch.max(outputs.data, 1)
            #predicted = (outputs > 0.5).float()
            total += labels.size(0)
            correct += (predicted == labels).sum().item()
            accuracy=correct/N_test
            accuracy_list.append(accuracy)

    avg_val_loss = val_loss / len(test_iterator)
    accuracy = correct / total
    #print("Accuracy List:", accuracy_list)
    print(f'Epoch {epoch+1}/{num_epochs}, 
    Loss: {loss.item():.4f}, Val Loss: {avg_val_loss:.4f}, 
    Accuracy: {accuracy:.2%}')

If you see the below image, accuracy is going high on each epoch. (slight fluctuation, but that is Ok).

Plotting accuracy vs. loss graph.

# Author: Styrish AI

# Finally, Plot the accuracy vs Loss graph to show 
# the model's performance. Ideally, Loss line should be going downwards
# and accuracy line should be going upwards in order to design a well
# trained model.

import matplotlib.pyplot as plt
fig, ax1 = plt.subplots()
color = 'tab:red'
ax1.plot(cost_list,color=color)
ax1.set_xlabel('epoch',color=color)
ax1.set_ylabel('total loss',color=color)
ax1.tick_params(axis='y', color=color)

ax2 = ax1.twinx()
color = 'tab:blue'
ax2.set_ylabel('accuracy', color=color)
ax2.plot( accuracy_list, color=color)
ax2.tick_params(axis='y', labelcolor=color)
fig.tight_layout()

That's It !

Saving the model.

# Author: Styrish AI

# Once satisfied, save the model for future use. I have save the trained model
# to Google drive.

trained_model_path = '/content/drive/MyDrive/deep-learning/models/rnn/sentiment-analysis/model.pth'
torch.save(model.state_dict(), trained_model_path)

Finetuning of Transformers in Natural Language Processing

Transformers are the essential parts of deep neural network, and widely used in Natural language processing tasks. We have a wide variety of usages where transformers are used in real time scenarios, such as, translations, text generation, question answering and various other NLP tasks. One of the widely used examples of transformer is Chat GPT. More information about transformer architecture and its mechanism can be accessed on page Understanding Transformers (BERT & GPT) . One of the very important processes in transformers is Finetuning. Finetuning is the way for adapting the OOB (out of the box) model for your specific tasks. In other words, it is the process of training a pre-trained model on your specific datasets to adapt the knowledge from new dataset. During fine-tuning, the parameters of the pre-trained model are adjusted based on the task-specific dataset. The goal is to adapt the model’s knowledge to perform well on the particular task of interest. Let’s understand how ...

Next Gen Neuron

Search This Blog

RNN model training with PyTorch

Comments

Post a Comment

Popular posts from this blog

Exploring CNN with TensorFlow & Keras

Finetuning of Transformers in Natural Language Processing

Tuning Hyperparameters and visualizing on TensorBoard