- Daily Build
- Posts
- Training your first LLM in Google Colab
Training your first LLM in Google Colab
Let's get our hands dirty
Hey friends. Today I’m showing you how to train a model in Google Colab using PyTorch and GPT-4. I’ll assume some of you have done this before but most of you probably haven’t. Even if you don’t need or want to train an LLM, it’s helpful to understand what’s going on.
Technical chops: Medium
Time to read: 5 minutes
Fun: You tell me
Many of you might be in Product or Engineering and want to train a model but actually haven’t gotten to see what’s happening up close. It isn’t too hard but rather it’s more about understanding what’s going on and why.
What’s actually going on
We’re using the MNIST dataset (I asked ChatGPT to pick a dataset) which is a collection of handwritten digits commonly used for training various image processing systems.
It contains 70,000 images of handwritten digits (0-9), each of which is 28x28 pixels. Each image is labeled with the digit it represents. The dataset is divided into two parts:
Training set with 60,000 images
Test set with 10,000 images
At a high level, we are training a neural network model to recognize handwritten digits. The model takes as input an image of a handwritten digit and predicts the digit in the image.
During training, the model is shown many examples of handwritten digits along with the correct labels, and it learns to predict the correct digit for each image.
The model learns by adjusting its parameters to minimize the difference between the predicted digit and the actual label. Once the model is trained, it can be used to predict the digit in new images of handwritten digits.
Prompt
“I need you to help me train a model using Google Colab and PyTorch. Please use a dataset that you find. I need you to write the code and explain what you're doing and why. Be thorough. The dataset shouldn't be too large. The new model should be able to be used in future applications.”
Side note: I asked it to use PyTorch because it’s better than TensorFlow.
Code
import torch
import torch.nn as nn
import torch.optim as optim
import torchvision
import torchvision.transforms as transforms
# Define a transform to normalize the data
transform = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize((0.5,), (0.5,))
])
# Load the MNIST dataset
train_dataset = torchvision.datasets.MNIST(root='./data', train=True, download=True, transform=transform)
test_dataset = torchvision.datasets.MNIST(root='./data', train=False, transform=transform)
# Define data loaders
train_loader = torch.utils.data.DataLoader(train_dataset, batch_size=100, shuffle=True)
test_loader = torch.utils.data.DataLoader(test_dataset, batch_size=100, shuffle=False)
# Define the model
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
self.fc1 = nn.Linear(28*28, 500)
self.fc2 = nn.Linear(500, 10)
def forward(self, x):
x = x.view(-1, 28*28)
x = torch.relu(self.fc1(x))
x = self.fc2(x)
return x
# Instantiate the model
model = Net()
# Define the loss function and the optimizer
criterion = nn.CrossEntropyLoss()
optimizer = optim.SGD(model.parameters(), lr=0.01)
# Train the model
for epoch in range(10): # loop over the dataset multiple times
running_loss = 0.0
for i, data in enumerate(train_loader, 0):
# get the inputs; data is a list of [inputs, labels]
inputs, labels = data
# zero the parameter gradients
optimizer.zero_grad()
# forward + backward + optimize
outputs = model(inputs)
loss = criterion(outputs, labels)
loss.backward()
optimizer.step()
# print statistics
running_loss += loss.item()
if i % 100 == 99: # print every 100 mini-batches
print('[%d, %5d] loss: %.3f' %
(epoch + 1, i + 1, running_loss / 100))
running_loss = 0.0
print('Finished Training')
# Save the model
torch.save(model.state_dict(), 'model.pth')
print('Model saved as model.pth')
Sequence of events
Importing Libraries: We import necessary libraries like PyTorch, torchvision, and transforms.
Data Transformation: We define a transformation to normalize the images in the dataset.
Loading Dataset: We load the MNIST dataset for training and testing, and apply the previously defined transformations.
Data Loaders: We define data loaders for the training and testing datasets to load data in batches.
Model Definition: We define the neural network model with one hidden layer.
Model Instantiation: We create an instance of the model.
Loss Function and Optimizer: We define the loss function (CrossEntropyLoss) and the optimizer (SGD) that will be used to train the model.
Model Training: We train the model using the training data for 10 epochs, updating the model parameters after each batch of images.
Saving the Model: We save the trained model parameters to a file called 'model.pth'.
Google Colab is wonderful because you can run this code and it shows the output in real time as it’s doing the training.
Note: An epoch is one complete pass through the entire training dataset. During an epoch, the model's parameters are updated in an effort to minimize the loss function.
As you can see, it took 2 minutes and 46 seconds to complete the training. After it’s done, it saves the new model in your file folder on the left. That new model can then be used in other applications and finetuned (further refinement).
So as you can see, sky’s the limit with this stuff - you can really make LLMs powerful with the right datasets and parameters, making your applications suuuuper powerful.
Here’s the Google Colab file with the code if you’d like to run it yourself.
Please do me a favor and forward this post to someone who might dig it.
Reply