PyTorch Introduction¶

PyTorch tensors are essentially like numpy arrays, but they can run on GPU.

Let's take a look on how tensors look like in pytorch:¶

import torch

T = torch.Tensor([[1,2],[3,4]])
print(T)

tensor([[1., 2.],
        [3., 4.]])

print(T**2)

tensor([[ 1.,  4.],
        [ 9., 16.]])

The library provides operators for component-wise and vector/matrix operations.¶

v = torch.tensor([ 10., 20., 30.])
M = torch.tensor([[ 0., 0., 3. ], [ 0., 2., 0. ], [ 1., 0., 0. ]])
print(v), print(M)

print(M.mv(v)) # M*v
print(M @ v)

tensor([10., 20., 30.])
tensor([[0., 0., 3.],
        [0., 2., 0.],
        [1., 0., 0.]])
tensor([90., 40., 10.])
tensor([90., 40., 10.])

In-place operations are suffixed with an underscore¶

T = torch.empty(2, 4)
T.fill_(0.05)
print(T)
T.add_(2)
print(T)

tensor([[0.0500, 0.0500, 0.0500, 0.0500],
        [0.0500, 0.0500, 0.0500, 0.0500]])
tensor([[2.0500, 2.0500, 2.0500, 2.0500],
        [2.0500, 2.0500, 2.0500, 2.0500]])

T += torch.randn(T.size())
print(T)

tensor([[3.1971, 2.6434, 1.7254, 3.2174],
        [3.6084, 2.6146, 2.0607, 2.3428]])

You can convert numpy to tensor or vise versa¶

import numpy as np

v = np.ones(6)
print(v)

T = torch.from_numpy(v)
print(T)

[1. 1. 1. 1. 1. 1.]
tensor([1., 1., 1., 1., 1., 1.], dtype=torch.float64)

T1 = torch.randn(3,3)
print(T1)
v1 = T1.numpy()
print(v1)

tensor([[-0.5909, -1.1718,  2.2015],
        [ 2.4058,  1.5632,  2.1710],
        [ 0.8559,  1.6580,  1.2921]])
[[-0.5908667 -1.171818   2.2015047]
 [ 2.4057586  1.5632267  2.1709604]
 [ 0.8558552  1.6579785  1.2920573]]

The tensor and numpy array will share their underlying memory locations¶

T.add_(1)
print(T)
print(v)

np.add(v1, 3, out=v1)
print(v1)
print(T1)

tensor([2., 2., 2., 2., 2., 2.], dtype=torch.float64)
[2. 2. 2. 2. 2. 2.]
[[2.4091334 1.828182  5.2015047]
 [5.405759  4.5632267 5.1709604]
 [3.8558552 4.6579785 4.292057 ]]
tensor([[2.4091, 1.8282, 5.2015],
        [5.4058, 4.5632, 5.1710],
        [3.8559, 4.6580, 4.2921]])

Autograd: automatic differentiation¶

Any tensor operation done by PyTorch can be automatically differentiated by the autograd package.
We only need to write the forward pass, autograd takes care of tracking the computational graph associated, and compute the gradients.
To have its operations tracked by autograd you just need to set the attribute requires_grad as True.
Every tensor also has a field grad, itself a tensor of same size, type used to accumulate gradients.

# A simple example:
x = torch.tensor(1., requires_grad=True)
w = torch.tensor(2., requires_grad=True)
b = torch.tensor(3., requires_grad=True)

# Build a computational graph.
y = w * x + b    # y = 2 * x + 3
print(y.grad_fn)

y.backward()
print(x.grad)
print(w.grad)
print(b.grad)

<ThAddBackward object at 0x7f222a21aa90>
tensor(2.)
tensor(1.)
tensor(1.)

PyTorch Modules - Neural Networks¶

Neural networks can be constructed using the torch.nn package.
Our idealized modules are constructed as subclasses of torch.nn.Module.
We also use elements of torch.nn.functional which are autograd-compliant functions.

import torch.nn as nn
import torch.nn.functional as F

x = torch.randn(2,3)
print(x)
x = F.relu(x)
print(x)

tensor([[-1.7101,  0.9627, -0.1172],
        [-0.5871, -0.3112, -1.7162]])
tensor([[0.0000, 0.9627, 0.0000],
        [0.0000, 0.0000, 0.0000]])

f = nn.Linear(in_features = 10, out_features = 4)
for n, p in f.named_parameters(): print(n, p.size())

weight torch.Size([4, 10])
bias torch.Size([4])

x = torch.empty(350, 10).normal_()
y = f(x)
print(y.size())

torch.Size([350, 4])

Let's define a feedforward neural network as a module¶

# Fully connected neural network with one hidden layer
class NeuralNet(nn.Module):
    def __init__(self, input_size, hidden_size, num_classes):
        super(NeuralNet, self).__init__()
        self.fc1 = nn.Linear(input_size, hidden_size) 
        self.relu = nn.ReLU()
        self.fc2 = nn.Linear(hidden_size, num_classes)  
    
    def forward(self, x):
        out = self.fc1(x)
        out = self.relu(out)
        out = self.fc2(out)
        return out

import torchvision
import torchvision.transforms as transforms

input_size = 784
hidden_size = 500
num_classes = 10
num_epochs = 5
batch_size = 100
learning_rate = 0.001

# MNIST dataset 
train_dataset = torchvision.datasets.MNIST(root='../../data', 
                                           train=True, 
                                           transform=transforms.ToTensor(),  
                                           download=True)

test_dataset = torchvision.datasets.MNIST(root='../../data', 
                                          train=False, 
                                          transform=transforms.ToTensor())

# Data loader
train_loader = torch.utils.data.DataLoader(dataset=train_dataset, 
                                           batch_size=batch_size, 
                                           shuffle=True)

test_loader = torch.utils.data.DataLoader(dataset=test_dataset, 
                                          batch_size=batch_size, 
                                          shuffle=False)

model = NeuralNet(input_size, hidden_size, num_classes)

# Loss and optimizer
criterion = nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)

# Train the model
total_step = len(train_loader)
for epoch in range(num_epochs):
    for i, (images, labels) in enumerate(train_loader):  
        # Move tensors to the configured device
        images = images.reshape(-1, 28*28)
        labels = labels
        
        # Forward pass
        outputs = model(images)
        loss = criterion(outputs, labels)
        
        # Backward and optimize
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        
        if (i+1) % 100 == 0:
            print ('Epoch [{}/{}], Step [{}/{}], Loss: {:.4f}' 
                   .format(epoch+1, num_epochs, i+1, total_step, loss.item()))

Epoch [1/5], Step [100/600], Loss: 0.3366
Epoch [1/5], Step [200/600], Loss: 0.2874
Epoch [1/5], Step [300/600], Loss: 0.3106
Epoch [1/5], Step [400/600], Loss: 0.1440
Epoch [1/5], Step [500/600], Loss: 0.1801
Epoch [1/5], Step [600/600], Loss: 0.0872
Epoch [2/5], Step [100/600], Loss: 0.0701
Epoch [2/5], Step [200/600], Loss: 0.1648
Epoch [2/5], Step [300/600], Loss: 0.1814
Epoch [2/5], Step [400/600], Loss: 0.1003
Epoch [2/5], Step [500/600], Loss: 0.1224
Epoch [2/5], Step [600/600], Loss: 0.1301
Epoch [3/5], Step [100/600], Loss: 0.1409
Epoch [3/5], Step [200/600], Loss: 0.0185
Epoch [3/5], Step [300/600], Loss: 0.0263
Epoch [3/5], Step [400/600], Loss: 0.0580
Epoch [3/5], Step [500/600], Loss: 0.1400
Epoch [3/5], Step [600/600], Loss: 0.0614
Epoch [4/5], Step [100/600], Loss: 0.0196
Epoch [4/5], Step [200/600], Loss: 0.0103
Epoch [4/5], Step [300/600], Loss: 0.0380
Epoch [4/5], Step [400/600], Loss: 0.1080
Epoch [4/5], Step [500/600], Loss: 0.0327
Epoch [4/5], Step [600/600], Loss: 0.0468
Epoch [5/5], Step [100/600], Loss: 0.0610
Epoch [5/5], Step [200/600], Loss: 0.0176
Epoch [5/5], Step [300/600], Loss: 0.0519
Epoch [5/5], Step [400/600], Loss: 0.0801
Epoch [5/5], Step [500/600], Loss: 0.0516
Epoch [5/5], Step [600/600], Loss: 0.0318

with torch.no_grad():
    correct = 0
    total = 0
    for images, labels in test_loader:
        images = images.reshape(-1, 28*28)
        labels = labels
        outputs = model(images)
        _, predicted = torch.max(outputs.data, 1)
        total += labels.size(0)
        correct += (predicted == labels).sum().item()

    print('Accuracy of the network on the 10000 test images: {} %'.format(100 * correct / total))

# Save the model checkpoint
torch.save(model.state_dict(), 'model.ckpt')

Accuracy of the network on the 10000 test images: 97.72 %