Linear regression. model that predicts crop yields for apples and oranges (target variables) by looking at the average temperature, rainfall, and humidity (input variables or features) in a region. Here's the training data:

linear-regression-training-data

In a linear regression model, each target variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

yield_apple  = w11 * temp + w12 * rainfall + w13 * humidity + b1
yield_orange = w21 * temp + w22 * rainfall + w23 * humidity + b2

Visually, it means that the yield of apples is a linear or planar function of temperature, rainfall and humidity:

linear-regression-graph

import torch
import numpy as np 

Training Data

inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70]], dtype='float32')
targets = np.array([[56,70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119]], dtype='float32')
inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)
tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

Linear Regression Model from Scratch

w = torch.randn(2, 3, requires_grad=True) # torch.randn : creates a tensor with givent shape with random elements picked with normal distribution
b = torch.randn(2, requires_grad= True)
print(w)
print(b)
tensor([[ 0.0728, -2.0486,  0.2053],
        [ 1.4556, -1.4721, -1.4280]], requires_grad=True)
tensor([-2.6483, -2.7893], requires_grad=True)

Our model is just X * W_transpose + Bias

def model(x):
    return x @ w.t() + b # @-> matrix multiplication in pytorch, .t() returns the transpose of a tensor
inputs @ w.t() + b
tensor([[-125.7638,  -56.5679],
        [-163.1629,  -91.2699],
        [-258.9209, -156.2401],
        [ -75.7193,   29.5415],
        [-179.9207, -143.6369]], grad_fn=<AddBackward0>)
preds = model(inputs)
preds
tensor([[-125.7638,  -56.5679],
        [-163.1629,  -91.2699],
        [-258.9209, -156.2401],
        [ -75.7193,   29.5415],
        [-179.9207, -143.6369]], grad_fn=<AddBackward0>)
print(targets)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])
diff = preds - targets
# diff * diff                 #  * means element wise multiplication not matrix multiplication
torch.sum(diff*diff) / diff.numel()  # numel -> number of element in diff matrix
tensor(53075.1758, grad_fn=<DivBackward0>)

Loss function

MSE Loss :- On average, each element in prediction differs from the actual target by the square root of the loss

def mse(t1,t2):
    diff = t1 - t2
    return torch.sum(diff * diff) / diff.numel()
    
loss = mse(preds, targets)
print(loss)
tensor(53075.1758, grad_fn=<DivBackward0>)
loss.backward()
print(w)
print(w.grad) # derivative of the loss w.r.t element in w
tensor([[ 0.0728, -2.0486,  0.2053],
        [ 1.4556, -1.4721, -1.4280]], requires_grad=True)
tensor([[-19571.1211, -23133.6465, -13756.3496],
        [-14156.5244, -17938.3672, -10636.8340]])
print(b)
print(b.grad)
tensor([-2.6483, -2.7893], requires_grad=True)
tensor([-236.8975, -175.6347])

Grad of loss w.r.t each element in tensor indicates the rate of change of loss or slope of the loss function

we can substract from each weight element a small quantity proportional to the derivative of the loss w.r.t that element to reduce the loss slightly

print(w)
w.grad
tensor([[ 0.0728, -2.0486,  0.2053],
        [ 1.4556, -1.4721, -1.4280]], requires_grad=True)
tensor([[-19571.1211, -23133.6465, -13756.3496],
        [-14156.5244, -17938.3672, -10636.8340]])
print(w)
w.grad * 1e-5 # new weights to near w 
tensor([[ 0.0728, -2.0486,  0.2053],
        [ 1.4556, -1.4721, -1.4280]], requires_grad=True)
tensor([[-0.1957, -0.2313, -0.1376],
        [-0.1416, -0.1794, -0.1064]])
with torch.no_grad():
    w -= w.grad * 1e-5  # 1e-5 is the step ie small coz loss is large.....Learning Rate
    b -= b.grad * 1e-5

torch.no_grad() to indicate to Pytorch that we shouldn't take track, calculate, or modify gradients while updating the weights and biases

w, b
(tensor([[ 0.2685, -1.8173,  0.3429],
         [ 1.5971, -1.2927, -1.3216]], requires_grad=True),
 tensor([-2.6459, -2.7876], requires_grad=True))
preds = model(inputs)
loss = mse(preds, targets)
print(loss)
tensor(37202.4609, grad_fn=<DivBackward0>)

Now reset the gradients to 0

w.grad.zero_()
b.grad.zero_()
print(w.grad)
print(b.grad)
tensor([[0., 0., 0.],
        [0., 0., 0.]])
tensor([0., 0.])

Train the Model using Gradient descent

preds = model(inputs)
print(preds)
tensor([[ -90.0597,  -29.6393],
        [-116.1892,  -55.7924],
        [-202.9139, -113.7154],
        [ -40.7170,   55.6320],
        [-134.5765, -109.2006]], grad_fn=<AddBackward0>)
loss = mse(preds, targets)
print(loss)
tensor(37202.4609, grad_fn=<DivBackward0>)
loss.backward()
print(w.grad)
print(b.grad)
tensor([[-15880.6006, -19155.8594, -11304.5137],
        [-11370.2803, -14927.9023,  -8782.6719]])
tensor([-193.0913, -142.5432])

update the weights and biases using gradientdescent

with torch.no_grad():
    w -= w.grad * 1e-5
    b -= b.grad * 1e-5
    w.grad.zero_()
    b.grad.zero_()
print(w)
print(b)
tensor([[ 0.4273, -1.6257,  0.4559],
        [ 1.7108, -1.1434, -1.2338]], requires_grad=True)
tensor([-2.6440, -2.7862], requires_grad=True)
preds = model(inputs)
loss = mse(preds, targets)
print(loss)
tensor(26488.4434, grad_fn=<DivBackward0>)

Train on multiple Epochs

for i in range(100):
    preds = model(inputs)
    loss  = mse(preds, targets)
    loss.backward()
    with torch.no_grad():
        w -= w.grad * 1e-5
        b -= b.grad * 1e-5
        w.grad.zero_()
        b.grad.zero_()
preds = model(inputs)
loss = mse(preds, targets)
print(loss)
tensor(1333.2324, grad_fn=<DivBackward0>)
print(preds)
print(targets)
tensor([[ 65.7250,  83.3404],
        [ 92.7282,  99.8337],
        [ 80.9948, 113.9022],
        [ 73.8770, 114.4924],
        [ 88.8938,  71.9206]], grad_fn=<AddBackward0>)
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.]])

Linear Regression using Pytorch built-ins

import torch.nn as nn
inputs = np.array([[73, 67, 43],
                   [91, 88, 64],
                   [87, 134, 58],
                   [102, 43, 37],
                   [69, 96, 70],
                   [74, 66, 43],
                   [91, 87, 65],
                   [88, 134, 59],
                   [101, 44, 37],
                   [68, 96, 71],
                   [73, 66, 44],
                   [92, 87, 64],
                   [87, 135, 57],
                   [103, 43 ,36],
                   [68, 97, 70]], dtype='float32')


targets = np.array([[56,70],
                    [81, 101],
                    [119, 133],
                    [22, 37],
                    [103, 119],
                    [57,69],
                    [80,102],
                    [118, 132],
                    [21, 38],
                    [104, 118],
                    [57, 69],
                    [82, 100],
                    [118, 134],
                    [20, 38],
                    [102, 120]], dtype='float32')

inputs = torch.from_numpy(inputs)
targets = torch.from_numpy(targets)
print(inputs)
print(targets)
tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 74.,  66.,  43.],
        [ 91.,  87.,  65.],
        [ 88., 134.,  59.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 73.,  66.,  44.],
        [ 92.,  87.,  64.],
        [ 87., 135.,  57.],
        [103.,  43.,  36.],
        [ 68.,  97.,  70.]])
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

Dataset and DataLoader

creating a TensorDataset, which allows access to rows from inputs and targets as tuples and provide standard APIs for working many different types pf datasets in Pytorch

from torch.utils.data import TensorDataset
train_ds = TensorDataset(inputs, targets)
train_ds[0:3]                                    # 0 to 3-1
(tensor([[ 73.,  67.,  43.],
         [ 91.,  88.,  64.],
         [ 87., 134.,  58.]]),
 tensor([[ 56.,  70.],
         [ 81., 101.],
         [119., 133.]]))
 from torch.utils.data import DataLoader
batch_size = 5
train_dl = DataLoader(train_ds, batch_size, shuffle=True)
inputs
tensor([[ 73.,  67.,  43.],
        [ 91.,  88.,  64.],
        [ 87., 134.,  58.],
        [102.,  43.,  37.],
        [ 69.,  96.,  70.],
        [ 74.,  66.,  43.],
        [ 91.,  87.,  65.],
        [ 88., 134.,  59.],
        [101.,  44.,  37.],
        [ 68.,  96.,  71.],
        [ 73.,  66.,  44.],
        [ 92.,  87.,  64.],
        [ 87., 135.,  57.],
        [103.,  43.,  36.],
        [ 68.,  97.,  70.]])
for xb, yb in train_dl:
    print(xb)
    print(yb)
    break
tensor([[102.,  43.,  37.],
        [ 91.,  87.,  65.],
        [ 69.,  96.,  70.],
        [ 88., 134.,  59.],
        [ 74.,  66.,  43.]])
tensor([[ 22.,  37.],
        [ 80., 102.],
        [103., 119.],
        [118., 132.],
        [ 57.,  69.]])

nn.Linear

Instead of initialising the weights and biases manually, we can define the model using the nn.Linear

model = nn.Linear(3, 2)
print(model.weight)
print(model.bias)
Parameter containing:
tensor([[-0.1637,  0.0519, -0.1459],
        [-0.2050,  0.2159, -0.0023]], requires_grad=True)
Parameter containing:
tensor([-0.1157, -0.1562], requires_grad=True)
list(model.parameters())
[Parameter containing:
 tensor([[-0.1637,  0.0519, -0.1459],
         [-0.2050,  0.2159, -0.0023]], requires_grad=True),
 Parameter containing:
 tensor([-0.1157, -0.1562], requires_grad=True)]
preds = model(inputs)
preds
tensor([[-14.8669,  -0.7540],
        [-19.7889,   0.0420],
        [-15.8733,  10.8090],
        [-19.9838, -11.8685],
        [-16.6477,   6.2663],
        [-15.0825,  -1.1750],
        [-19.9867,  -0.1762],
        [-16.1830,  10.6017],
        [-19.7683, -11.4475],
        [-16.6298,   6.4690],
        [-15.0646,  -0.9722],
        [-20.0045,  -0.3789],
        [-15.6756,  11.0272],
        [-20.0016, -12.0712],
        [-16.4321,   6.6873]], grad_fn=<AddmmBackward>)

Loss Function

import torch.nn.functional as F
loss_fn = F.mse_loss
loss = loss_fn(model(inputs), targets)
print(loss)
tensor(9453.6309, grad_fn=<MseLossBackward>)

Optimizer

we will use stochastic gradient descent -> optim.SGD

 
opt = torch.optim.SGD(model.parameters(), lr=1e-5) #lr is the learning rate

Train the Model

def fit(num_epochs, model, loss_fn, opt, train_dl):
    
    for epoch in range(num_epochs):
        
        for xb, xy in train_dl:
            
            pred = model(xb) # Generate Predictions
            
            loss = loss_fn(pred, yb) # calculate loss
            
            loss.backward() # compute gradient
            
            opt.step() # update parameters using gradient
            
            opt.zero_grad() # reset the gradient to zero 
        
        if (epoch+1) % 10 == 0:
            print('Epoch [{}/{}], Loss: {:.4f}'.format(epoch+1, num_epochs, loss.item()))
            
            
fit(100, model, loss_fn, opt, train_dl)
Epoch [10/100], Loss: 1473.9495
Epoch [20/100], Loss: 1101.0323
Epoch [30/100], Loss: 1247.5220
Epoch [40/100], Loss: 1066.2527
Epoch [50/100], Loss: 1192.7886
Epoch [60/100], Loss: 1239.0150
Epoch [70/100], Loss: 916.5994
Epoch [80/100], Loss: 986.2520
Epoch [90/100], Loss: 1190.9945
Epoch [100/100], Loss: 1572.8744
preds = model(inputs)
preds
tensor([[ 62.2054,  75.3431],
        [ 81.8397,  99.3478],
        [ 75.8782,  91.9953],
        [ 78.3076,  94.4091],
        [ 70.4413,  85.9155],
        [ 62.8510,  76.1146],
        [ 82.2799,  99.9035],
        [ 76.9230,  93.2708],
        [ 77.6620,  93.6375],
        [ 70.2358,  85.6997],
        [ 62.6455,  75.8988],
        [ 82.4854, 100.1193],
        [ 75.4381,  91.4396],
        [ 78.5131,  94.6249],
        [ 69.7957,  85.1440]], grad_fn=<AddmmBackward>)
targets
tensor([[ 56.,  70.],
        [ 81., 101.],
        [119., 133.],
        [ 22.,  37.],
        [103., 119.],
        [ 57.,  69.],
        [ 80., 102.],
        [118., 132.],
        [ 21.,  38.],
        [104., 118.],
        [ 57.,  69.],
        [ 82., 100.],
        [118., 134.],
        [ 20.,  38.],
        [102., 120.]])

Random input Batch

model(torch.tensor([[75, 63, 44.]])) # we'll get a batch of output
tensor([[63.9573, 77.4678]], grad_fn=<AddmmBackward>)