When to use linear regression (2024)

By Christina Ellis / May 26, 2022 / Machine learning / Leave a Comment

Share this article

Are you wondering when you should choose a linear regression model over a similar machine learning model? Well then you are in the right place! In this article we tell you everything you need to know to determine when you should reach for a linear regression model.

This article starts out with a discussion of what kind of outcome variables linear regression is typically used for. After that, some of the main advantages and disadvantages of linear regression are discussed. Finally, we provide specific examples of scenarios where you should and should not use a linear regression model.

What outcomes can you use linear regression for?

What types of outcome variables can you use linear regression for? Linear regression should be used when your outcome variable is a numeric variable. If your outcome variable is not numeric, then you should consider looking into other types of regression models.

For example, if you have a binary outcome then you can use a logistic regression model. If your outcome variable is a count variable, you can look into using a poisson regression model.

Advantages and disadvantages of linear regression

Are you wondering what the main advantages and disadvantages of linear regression models are? Here are some of the main advantages and disadvantages of linear regression models.

Advantages of linear regression models

Interpretable coefficients. One of the main advantages of linear regression models is that they have easily interpretable coefficients that come along with confidence intervals and statistical tests. This is very important if inference is a high priority in the project you are working on. Most other machine learning models do not have the same straightforward interpretation that linear regression models do.
No hyperparameters. Another advantage of linear regression is that it does not have hyperparameters that need to be tuned. You may need to preprocess your data and select which features to use in your model, but other than that there is no need to run different versions of your model with different hyperparameters.
Well understood. Another benefit of linear regression is that it is well studied and well understood. Most people who have taken an introductory statistics class have at least heard of linear regression. This means that it tends to be more popular with skeptical stakeholders who do not trust other machine learning models.
Fast inference. A final advantage of linear regression is that it has fast and simple inference that can be implemented even without the use of dedicated machine learning libraries. This means that it is easier to put linear regression models in production at companies that do not have facilities for serving machine learning models built out.

Disadvantages of linear regression models

Thrown off by outliers. One disadvantage of linear regression is that it is easily thrown off by outliers in your dataset. If you are using a linear regression model, you should examine your input data and model artifacts to make sure that the model is not being unduly influenced by outliers.
Thrown off by correlated features. Another disadvantage of linear regression is that it is easily thrown off if you have multiple highly correlated features in your model.
Need to specify interactions. Another disadvantage of linear regression is that you need to explicitly specify interactions that the model should consider when you build your model. If you do not specify interactions between your features, the model will not recognize and account for these interactions.
Assumes linearity. Linear regression models also assume that there is a linear relationship between your model features and your outcome variable. This means that you might have to preprocess your model features to make the relationship more linear.
Cannot handle missing data. Most implementations of linear regression models can not handle missing data natively. That means that you need to preprocess your data and handle the missing values before you run your model
Not peak predictive performance. Another general disadvantage of linear regression is that it does not generally have peak predictive performance on tabular data. If prediction is your main goal, there are other machine learning models that tend to have better predictive performance.

When to use a linear regression model

When should you choose to use a linear regression model? Here are some examples of scenarios where you should use a linear regression model over another model.

Inference is your primary goal. If inference is our primary goal, you are often better off using linear regression than another machine learning model. Linear regression models give you estimates of the magnitude of the relationship between your features and your outcome variable along with other useful values like confidence intervals and statistical tests.
Baseline model. If you are looking for a simple baseline model that you can use to compare more complicated models against, a linear regression model is a decent choice. This is especially true if you have a relatively clean dataset that does not have many missing values or outliers. One of the main benefits linear regression has in these scenarios is that there are no hyperparameters that need to be tuned, so you only have to tune a single model.
Building trust. Since linear regression is a well studied and well publicized model, it is often a good model to reach for when you are still building trust with stakeholders that are skeptical of more complicated machine learning models. After you get buy-in for your linear regression model, you can start to compare the performance of other models to the performance of your linear regression model to show the business value that could be added by upgrading your model.

When not to use linear regression

When should you not use linear regression? Here are some examples of cases where you should avoid using a linear regression model.

Small improvements in predictive performance have a large impact. If you are operating in a scenario where small improvements in predictive performance can have large impacts on the business, you may be better off reaching for another model. For example, gradient boosted trees tend to have better predictive performance than linear regression models. This is especially true in cases where the relationships between your features and your outcome variable are not perfectly linear.
You don’t have a lot of time to explore the data. Since linear regression is easily thrown off by things like missing data, outliers, and correlated features, it is not a great choice to turn to if you do not have a lot of time to clean and preprocess your data. In these types of situations, you might be better off turning to a tree-based model, such as a random forestmodel, that is less sensitive to these issues.
You have more features than observations. If you have more features in your model than you do observations in your dataset, a standard linear regression is not a good choice. You should either reduce the number of features you are using in your model or use another model that can handle this situation. Ridge regression is one example of a model that can handle this situation.
You have many correlated features. If you have many features in your model that are correlated with one another, you may be better off using ridge regression. This is a regularized version of regression that handles correlated features much better than a standard regression model.

When to use logistic regression
When to use ordinal logistic regression
When to use multinomial regression
When to use random forests
When to use ridge regression
When to use LASSO
When to use Bayesian regression
When to use support vector machines
When to use gradient boosted trees
When to use poisson regression
When to use neural networks
When to use mixed models
When to use generalized additive models

Are you trying to figure out which machine learning model is best for your next data science project? Check out our comprehensive guide on how to choose the right machine learning model.

Share this article

As a seasoned expert in machine learning, particularly in the realm of regression modeling, I find the article by Christina Ellis to be a comprehensive guide for those navigating the decision-making process between choosing a linear regression model or opting for an alternative in machine learning. The information presented aligns with my firsthand expertise, and I will delve into the concepts covered in the article.

Outcome Variables for Linear Regression: The article rightly emphasizes that linear regression is suitable when the outcome variable is numeric. This aligns with the foundational understanding of linear regression, where the goal is to model the relationship between independent variables and a continuous dependent variable. In contrast, for non-numeric outcomes, the article recommends exploring other regression models such as logistic regression for binary outcomes and Poisson regression for count variables.

Advantages of Linear Regression:

Interpretable Coefficients: Linear regression offers easily interpretable coefficients with confidence intervals and statistical tests. This is crucial for projects prioritizing inference.
No Hyperparameters: Unlike many other machine learning models, linear regression does not have hyperparameters that require tuning, simplifying the modeling process.
See Also
6 Types of Regression Models in Machine Learning You Should Know About | upGrad blog
Well Understood: Linear regression is well-studied and understood, making it a favorable choice, especially when dealing with skeptical stakeholders unfamiliar with complex machine learning models.
Fast Inference: Linear regression allows for fast and simple inference, making it suitable for deployment in environments without dedicated machine learning infrastructure.

Disadvantages of Linear Regression:

Outliers: Linear regression is sensitive to outliers, and the article advises thorough examination of input data and model artifacts to ensure robustness against outliers.
Correlated Features: The model can be thrown off by highly correlated features, necessitating caution when dealing with multicollinearity.
Need for Interaction Specification: Linear regression requires explicit specification of interactions between features, and failure to do so might result in the model overlooking important relationships.
Assumption of Linearity: Linear regression assumes a linear relationship between features and the outcome variable, potentially requiring preprocessing to meet this assumption.
Handling Missing Data: Most implementations of linear regression cannot handle missing data, necessitating data preprocessing before model training.
Predictive Performance: While linear regression excels in interpretability and inference, it may not achieve peak predictive performance on tabular data compared to other machine learning models.

When to Use Linear Regression:

Inference as Primary Goal: Linear regression is recommended when the primary goal is inference, as it provides estimates of the relationship between features and the outcome variable, accompanied by confidence intervals and statistical tests.
Baseline Model: For a simple baseline model, especially with a clean dataset lacking many missing values or outliers, linear regression is a suitable choice due to its simplicity and lack of hyperparameters.
Building Trust: Linear regression, being a well-established model, is advantageous when building trust with stakeholders who may be skeptical of more complex machine learning models.

When Not to Use Linear Regression:

Small Improvements Impact Business Significantly: In scenarios where small improvements in predictive performance can have a substantial impact on the business, alternative models like gradient boosted trees may be more suitable.
Limited Time for Data Exploration: Linear regression is not ideal when time is limited for cleaning and preprocessing data, as it is sensitive to issues like missing data, outliers, and correlated features. In such cases, tree-based models like random forests may be more robust.
More Features Than Observations: Linear regression is not suitable when the number of features exceeds the number of observations. Options include reducing the number of features or opting for models like ridge regression that can handle this situation.
Many Correlated Features: In cases with many correlated features, ridge regression is recommended as a regularized version that handles correlated features more effectively than standard linear regression.

The article concludes by providing links to related articles, offering a holistic view of regression modeling options, and even extending the discussion to broader topics such as choosing the right machine learning model for data science projects. This aligns seamlessly with my extensive knowledge and practical experience in the field of machine learning, emphasizing the nuanced decision-making process in selecting the appropriate regression model for specific scenarios.

FAQs

When to use linear regression? ›

You can use linear regression when you want to predict a continuous dependent variable from a scale of values. Use logistic regression when you expect a binary outcome (for example, yes or no). Here are examples of linear regression: Predicting the height of an adult based on the mother's and father's height.

Read On ›

How do you know if you should use linear regression? ›

The determinant of the type of regression analysis to be used is the nature of the outcome variable. Linear regression is used for continuous outcome variables (e.g., days of hospitalization or FEV1), and logistic regression is used for categorical outcome variables, such as death.

Discover More Details ›

What is linear regression best used for? ›

Scientists in many fields, including biology and the behavioral, environmental, and social sciences, use linear regression to conduct preliminary data analysis and predict future trends. Many data science methods, such as machine learning and artificial intelligence, use linear regression to solve complex problems.

Where is linear regression usually used? ›

Linear regression can be applied to various areas in business and academic study. You'll find that linear regression is used in everything from biological, behavioral, environmental and social sciences to business. Linear-regression models have become a proven way to scientifically and reliably predict the future.

See Details ›

How do you know when to use linear or nonlinear regression? ›

Nonlinear regression models should be used when the relationship between the independent and dependent variables is not linear. In linear regression, it is assumed that the effect of the independent variables on the dependent variable is the same across all levels of the independent variables.

Find Out More ›

When should you avoid linear regression? ›

[1] To recapitulate, first, the relationship between x and y should be linear. Second, all the observations in a sample must be independent of each other; thus, this method should not be used if the data include more than one observation on any individual.

Tell Me More ›

When we Cannot use linear regression? ›

Linear regression is a statistical technique used to understand the relationship between two continuous variables by fitting a straight line to the data points. However, it's not suitable for classification tasks where the goal is to predict which category or class an observation belongs to.

Show Me More ›

What is a real life example of linear regression? ›

One of the most common example wherein a linear regression model is used is when predicting the price of a house by analyzing sales data of that region. Linear Regression can be classified into two main categories - Simple Linear Regression and Multiple Linear Regression.

Explore More ›

When to use linear regression vs correlation? ›

A correlation analysis provides information on the strength and direction of the linear relationship between two variables, while a simple linear regression analysis estimates parameters in a linear equation that can be used to predict values of one variable based on the other.

When to use linear regression vs t test? ›

Using linear regression instead of a t test or ANOVA allows us to directly obtain estimates (differences between treatment groups) along with their confidence intervals instead of only P values. Additionally, interaction terms can be included, and the interaction can be evaluated.

Show Me More ›

In what situations is linear regression not a suitable method? ›

Answer: Linear regression is not suitable for classification because it predicts continuous outcomes rather than discrete classes.

Read The Full Story ›

What are the conditions for linear regression? ›

There are four assumptions associated with a linear regression model: Linearity: The relationship between X and the mean of Y is linear. hom*oscedasticity: The variance of residual is the same for any value of X. Independence: Observations are independent of each other.

See Details ›

How do you know if you should use linear or logistic regression? ›

You can use linear regression when you want to predict a continuous dependent variable from a scale of values. Use logistic regression when you expect a binary outcome (for example, yes or no). Here are examples of linear regression: Predicting the height of an adult based on the mother's and father's height.

Get More Info Here ›

How do you determine if a regression model is appropriate? ›

The adequacy of a linear regression model can be determined through four checks.

Check if the data and corresponding regression line look visually acceptable.
Check how many scaled residuals are in the [−2,2] range.
Check the coefficient of determination.
Check the assumption of the inherent randomness of the residuals.

Oct 5, 2023

How to decide which regression model to use? ›

You need to choose the model that has the highest performance metrics, the lowest complexity, and the best interpretability. You also need to consider the trade-offs between these factors, as well as your goal and context.

What we need to know to use a linear regression for predictions? ›

How to Make Predictions with Linear Regression

Step 1: Collect the data.
Step 2: Fit a regression model to the data.
Step 3: Verify that the model fits the data well.
Step 4: Use the fitted regression equation to predict the values of new observations.

Jul 27, 2021

View Details ›

When to use linear regression (2024)

What outcomes can you use linear regression for?

Advantages and disadvantages of linear regression

Advantages of linear regression models

Disadvantages of linear regression models

When to use a linear regression model

When not to use linear regression

Related articles

FAQs

When to use linear regression? ›

In what situations is linear regression not a suitable method? ›