This post will cover the linear regression implementation (From Scratch using Pytorch).

Linear Regression Equation : Overview

Linear regression attempts to fit a line of best fit to a data set, using one or more features as coefficients for a linear equation. It is an approach for modelling the relationship between dependent variable and independent variables.

In a linear regression model, each target (dependent) variable is estimated to be a weighted sum of the input variables, offset by some constant, known as a bias :

$$ Y = X.W^T + b \tag{1} $$


$$ Y = \begin{bmatrix}y_1\\y_2\\.\\.\\y_n\end{bmatrix}_{n\times1} X = \begin{bmatrix}x_{11} & x_{12} & . & . & x_{1n} \\x_{21} & x_{22} & . & . & x_{2n} \\. & . & . & . & .\\.& .& . & . & .\\x_{n1} & x_{n2} & . & . & x_{nn}\end{bmatrix}_{n\times n} W^T = \begin{bmatrix}w_1\\ w_2\\.\\.\\w_n\end{bmatrix}_{n\times 1} b = \begin{bmatrix}b_1\\ b_2\\.\\.\\b_n\end{bmatrix}_{n\times 1}

We get the following expansion of the equation 1 : $$ y_1 = w_1x_{11} + w_2x_{12} + . ..+ w_nx_{1n} + b_1 $$

Now, lets take an example for better explanation. Here, we take a look at advertisement data which has the following data :

The above data consists of sales of a particular product along with advertisement budget for the product in TV, radio and newspaper media. Our objective is to increase the sales of the product and we can control the budget of advertisement. So, if we determine the relationship between advertisement budget and sales, we can figure out how to increase the sales of a product by introducing changes in the advertisement budget. Here, the independent variables \((x_i)\) will be the advertisement budget for each of the three media and the dependent variable\((y)\) will be the sales of the product. The relationship between independent and dependent variables in this data can be defined as : $$ y = w_1x_1 + w_2x_2 + w_3x_3 + b \tag{2} $$ where, \( y\) is sales and \(x_1,x_2,x_3\) are advertisement budgets for TV, radio and newspaper respectively. The above equation can be written in matrix form as : $$ y = X.W^T +b $$

where $$ X = \begin{bmatrix}x_1 & x_2 & x_3\end{bmatrix}_{1\times 3} \hspace{1cm} W = \begin{bmatrix}w_1 & w_2 & w_3\end{bmatrix}_{1\times 3} $$

Now that we have discussed the linear regression equation, lets move on towards implementation and discuss the concepts implemented.

Implementation from Scratch (using Pytorch)

Importing relevant libraries


I am using Advertisement Data which was also used in ISLR book.

We can remove the inbuilt index in the data.

# removing the inbuilt index column
df.drop('Unnamed: 0', axis = 1, inplace=True)

Now lets get some more information on the dataset.

Lets divide the dataset into target and input variables and then split it into test and train data

x = df.drop('sales', axis =1).values
y = df[['sales']].values

# Converting the numpy array features to pytorch tensors.
inputs = torch.from_numpy(x).float()
targets = torch.from_numpy(y).float()

# Split Data into train and test
X_train,X_test,y_train,y_test = train_test_split(inputs,targets,test_size=0.20,random_state=0)

For the data split, I decided on 80–20 split for train and test set.

Weights and Bias

Our model is simply a function that performs a matrix multiplication of the inputs and the weights w (transposed) and adds the bias b (see equation 2). So we initialize the Weight matrix and bias


We can define the model as follows:

def model(x):
    return x @ w.t() + b

@ represents matrix multiplication in PyTorch, and the .t method returns the transpose of a tensor. The matrix obtained by passing the input data into the model is a set of predictions for the target variables.(see equation 2)

Loss Function

We need a way to evaluate how well our model is performing. We can compare the model’s predictions with the actual targets, using the following method:

  • Calculate the difference between the two matrices (preds and targets).
  • Square all elements of the difference matrix to remove negative values.
  • Calculate the average of the elements in the resulting matrix.

The result is a single number, known as the mean squared error (MSE).

def mse(t1, t2):
    diff = t1-t2
    return torch.sum(diff*diff)/diff.numel()

torch.sum returns the sum of all the elements in a tensor, and the .numel method returns the number of elements in a tensor.

Let’s compute the mean squared error for the current predictions of our model.

Here’s how we can interpret the result: On average, each element in the prediction differs from the actual target by about 276.829088 (square root of the loss 76634.3438). And that’s pretty bad, considering the numbers we are trying to predict are themselves in the range 1-27. Also, the result is called the loss, because it indicates how bad the model is at predicting the target variables. Lower the loss, better the model.

Gradient Descent

We’ll now minimize the loss function using the gradient descent algorithm. Intuitively, gradient descent takes small, linear steps down the slope of a function in each feature dimension, with the size of each step determined by the partial derivative of the cost function with respect to that feature and a learning rate multiplier \(\eta\). If tuned properly, the algorithm converges on a global minimum by iteratively adjusting feature weights \(\theta\) of the cost function, as shown here for two feature dimensions: $$ \theta_0 := \theta_0 - \eta\frac{\partial}{\partial\theta_0} J(\theta_0,\theta_1) $$ $$ \theta_1 := \theta_1 - \eta\frac{\partial}{\partial\theta_1} J(\theta_0,\theta_1) $$

Given that : $$ h_\theta(x) = \theta_0 + \theta_1x_1 $$ $$ J(\theta) = \frac{1}{2m}\sum\limits_{i = 1}^{m}(h_\theta(x^{(i)}) - y^{(i)})^2 $$

For more about it read here : CS229_Notes

With PyTorch, we can automatically compute the gradient or derivative of the loss w.r.t. to the weights and biases, because they have requires_grad set to True.

Adjust weights and biases using gradient descent

We’ll reduce the loss and improve our model using the gradient descent optimization algorithm, which has the following steps:

  1. Generate predictions

  2. Calculate the loss

  3. Compute gradients w.r.t the weights and biases

  4. Adjust the weights by subtracting a small quantity proportional to the gradient

  5. Reset the gradients to zero

Let’s implement the above step by step.

Let’s take a look at the loss after 1 epoch.

We have already achieved a significant reduction in the loss, simply by adjusting the weights and biases slightly using gradient descent.

Train for multiple epochs

To reduce the loss further, we can repeat the process of adjusting the weights and biases using the gradients multiple times. Each iteration is called an epoch. Let’s train the model for 1000 epochs.

Now lets see the final loss :

As you can see, the loss is now much lower than what we started out with.


Let’s look at the model’s predictions and compare them with the targets.

y_preds = model(X_test)

Lets plot actual vs predicted :

Now this whole process can be done using pytorch builtins. Check it out :

Linear Regression Using Pytorch Builtins