Linear Regression & Gradient Descent (2024)

Linear Regression & Gradient Descent (3)

Linear regression is one of the fundamental algorithms existing in machine learning. Understanding its internal workflow helps in grasping the main concepts of other algorithms in data science. Linear regression has a wide range of applications where it is used to predict a continuous variable.

Before diving into the inner workings of linear regression let us first understand a regression problem.

Regression is a machine learning problem aiming to predict a value of a continuous variable given a feature vector which is usually denoted as x = <x₁, x₂, x₃, …, xₙ> where xᵢ represents a value of the i-th feature in data. In order for a model to be able to make predictions, it has to be trained on a dataset containing mappings from feature vectors x to corresponding values of a target variable y. The learning process depends on the type of algorithm used for a certain task.

Linear Regression & Gradient Descent (4)

In the case of linear regression, the model learns such a vector of weights w = <w₁, w₂, w₃, …, wₙ> and a bias parameter b which try to approximate a target value y as <w, x> + b = x₁ * w₁ + x₂ * w₂ + x₃ * w₃ + … + xₙ * wₙ + b in the best possible for every dataset observation (x, y).

When building a linear regression model the ultimate goal is to find a vector of weights w and a bias term b that will more closely bring predicted value ŷ to the real target value y for all the inputs:

Linear Regression & Gradient Descent (5)

To make things easier, in the example we are going to look at, a dataset with single feature x is going to be used. Therefore, x and w are one-dimensional vectors. For simplicity, let us get rid of inner product notation and rewrite the equation above in the following way:

Linear Regression & Gradient Descent (6)

In order to train an algorithm, a loss function has to be chosen. The loss function measures how good or bad the algorithm made predictions for a set of objects at a single training iteration. Based on its value, the algorithm adjusts the parameters of the model in the hope that in the future the model will produce fewer errors.

One of the most popular loss functions is Mean Squared Error (or simply MSE) which measures the average square deviation between predicted and true values.

Linear Regression & Gradient Descent (7)

Gradient descent is an iterative algorithm of updating the weights’ vector for minimizing a given loss function by searching for a local minimum. Gradient descent uses the following formula on each iteration:

Linear Regression & Gradient Descent (8)
  • <w> is a vector of model weights on the current iteration. Computed weights are assigned to <w>’. During the first iteration of the algorithm, weights are usually initialized randomly but there exist other strategies as well.
  • alpha is usually a small positive value, also known as a learning rate, — hyperparameter which controls the speed rate of finding a local minimum.
  • upside-down triangle denotes a gradient — vector of partial derivatives of a loss function. In the current example, the vector of weights consists of 2 components. So, to compute a gradient of <w> 2 partial derivatives need to be computed (f represents a loss function):
Linear Regression & Gradient Descent (9)

The update formulas can be rewritten in the following way:

Linear Regression & Gradient Descent (10)

Right now the objective is to find partial derivatives of f. Assuming that MSE is chosen as a loss function, let us compute it for a single observation (n = 1 in the MSE formula), so f = (y — ŷ)² = (y — wx — b)².

Linear Regression & Gradient Descent (11)
Linear Regression & Gradient Descent (12)

The process of adjustment of model’s weights based on a single object is called stochastic gradient descent.

In the section above, model parameters were updated by calculating MSE for a single object (n = 1). In fact, it is possible to perform a gradient descent for several objects in a single iteration. This way of updating weights is called batch gradient descent.

Formulas for updating weights in such a case can be obtained in a very similar manner, compared to stochastic gradient descent in the previous section. The only difference is that here the number of objects n has to be taken into consideration. Ultimately, the sum of the terms of all objects in a batch is computed and then divided by n — the batch size.

Linear Regression & Gradient Descent (13)
Linear Regression & Gradient Descent (14)

When dealing with a dataset consisting only of a single feature, the regression results can be easily visualized on a 2D-plot. The horizontal axis represents values of the feature while the vertical axis contains target values.

The quality of a linear regression model can be visually evaluated by how closely it fits dataset points: the closer the average distance between every dataset point to the line, the better the algorithm is.

Linear Regression & Gradient Descent (15)

If a dataset contains more features, then visualization can be done by using dimensionality reduction techniques like PCA or t-SNE applied to features to represent them in lower dimensionality. After that, new features are plotted on 2D or 3D-plots, as usual.

Linear regression has a set of advantages:

  1. Training speed. Due to the simplicity of the algorithm, linear regression can be rapidly trained, compared to more complex machine learning algorithms. Moreover, it can be implemented through the LSM method which is also relatively fast and easy to understand.
  2. Interpretability. A linear regression equation built for several features can be easily interpreted in terms of feature importance. The higher the value of the coefficient of a feature, the more effect it has on the final prediction.
Linear Regression & Gradient Descent (16)

On the other hand, it comes with several disadvantages:

  1. Data assumptions. Before fitting a linear regression model it is important to check the type of dependency between output and input features. If is linear, then there should not be any issue with fitting it. Otherwise, the model is not normally able to fit the data well since the equation has only linear terms in it. In fact, it is possible to add higher degrees into the equation to turn the algorithm into polynomial regression, for instance. However, in reality, without a lot of domain knowledge it is often difficult to correctly foresee the type of dependency. This is one of the reasons why linear regression might not adapt to given data.
  2. Multicollinearity problem. Multicollinearity occurs when two or more predictors are highly correlated to each other. Imagine a situation when a change in one variable influences another variable. However, a trained model has no information about it. When these changes are large, it is difficult for the model to be stable during theh inference phase on unseen data. Therefore, this causes a problem of overfitting. Furthermore, the final regression coefficients might also be unstable for interpretation because of this.
  3. Data normalisation. In order to use linear regression as a feature importance tool, the data has to be normalized or standardized. This will make sure that all of the final regression coefficients are on the same scale and can be correctly interpreted.

We have looked through linear regression — a simple but very popular algorithm in machine learning. Its core principles are used in more complex algorithms.

Though linear regression is rarely used in modern production systems its simplicity allows to use often as a standard baseline in regression problems which is then compared to more sophisticated solutions.

The source code used in the article can be found GitHub:

All images unless otherwise noted are by the author.

Linear Regression & Gradient Descent (2024)
Top Articles
Fix: Nintendo Switch Error Code 2618-0516 (NAT Traversal Process Has Failed)
Bitcoin Pip value calculator - BTCUSD
Calvert Er Wait Time
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
From Algeria to Uzbekistan-These Are the Top Baby Names Around the World
Parks in Wien gesperrt
Newgate Honda
Caresha Please Discount Code
Rainfall Map Oklahoma
Studentvue Columbia Heights
National Weather Service Denver Co Forecast
Kitty Piggy Ssbbw
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Hellraiser III [1996] [R] - 5.8.6 | Parents' Guide & Review | Kids-In-Mind.com
Mikayla Campinos Laek: The Rising Star Of Social Media
Healthier Homes | Coronavirus Protocol | Stanley Steemer - Stanley Steemer | The Steem Team
Atdhe Net
Marion City Wide Garage Sale 2023
Okc Body Rub
Academy Sports Meridian Ms
Bill Remini Obituary
Bocca Richboro
Myql Loan Login
Poochies Liquor Store
Is Light Raid Hard
Giantbodybuilder.com
Pacman Video Guatemala
Turns As A Jetliner Crossword Clue
Kristy Ann Spillane
Yu-Gi-Oh Card Database
Ellafeet.official
Puretalkusa.com/Amac
Ishow Speed Dick Leak
Td Ameritrade Learning Center
Lonely Wife Dating Club בקורות וחוות דעת משתמשים 2021
Oklahoma City Farm & Garden Craigslist
Streameast Io Soccer
Oakley Rae (Social Media Star) – Bio, Net Worth, Career, Age, Height, And More
Human Resources / Payroll Information
5103 Liberty Ave, North Bergen, NJ 07047 - MLS 240018284 - Coldwell Banker
Blippi Park Carlsbad
Bismarck Mandan Mugshots
Minute Clinic Mooresville Nc
Westport gun shops close after confusion over governor's 'essential' business list
Call2Recycle Sites At The Home Depot
Read Love in Orbit - Chapter 2 - Page 974 | MangaBuddy
Southwind Village, Southend Village, Southwood Village, Supervision Of Alcohol Sales In Church And Village Halls
Glowforge Forum
The Significance Of The Haitian Revolution Was That It Weegy
Who We Are at Curt Landry Ministries
OSF OnCall Urgent Care treats minor illnesses and injuries
Latest Posts
Article information

Author: Mrs. Angelic Larkin

Last Updated:

Views: 6652

Rating: 4.7 / 5 (47 voted)

Reviews: 94% of readers found this page helpful

Author information

Name: Mrs. Angelic Larkin

Birthday: 1992-06-28

Address: Apt. 413 8275 Mueller Overpass, South Magnolia, IA 99527-6023

Phone: +6824704719725

Job: District Real-Estate Facilitator

Hobby: Letterboxing, Vacation, Poi, Homebrewing, Mountain biking, Slacklining, Cabaret

Introduction: My name is Mrs. Angelic Larkin, I am a cute, charming, funny, determined, inexpensive, joyous, cheerful person who loves writing and wants to share my knowledge and understanding with you.