Learn how to select the best performing linear regression for univariate models (2024)

/ #Data Science
Learn how to select the best performing linear regression for univariate models (1)
freeCodeCamp
Learn how to select the best performing linear regression for univariate models (2)

By Björn Hartmann

Find out which linear regression model is the best fit for your data

Inspired by a question after my previous article, I want to tackle an issue that often comes up after trying different linear models: You need to make a choice which model you want to use. More specifically, Khalifa Ardi Sidqi asked:

“How to determine which model suits best to my data? Do I just look at the R square, SSE, etc.?

As the interpretation of that model (quadratic, root, etc.) will be very different, won’t it be an issue?”

The second part of the question can be answered easily. First, find a model that best suits to your data and then interpret its results. It is good if you have ideas how your data might be explained. However, interpret the best model, only.

The rest of this article will address the first part of his question. Please note that I will share my approach on how to select a model. There are multiple ways, and others might do it differently. But I will describe the way that works best for me.

In addition, this approach only applies to univariate models. Univariate models have just one input variable. I am planning a further article, where I will show you how to assess multivariate models with more input variables. For today, however, let us focus on the basics and univariate models.

To practice and get a feeling for this, I wrote a small ShinyApp. Use it and play around with different datasets and models. Notice how parameters change and become more confident with assessing simple linear models. Finally, you can also use the app as a framework for your data. Just copy it from Github.

Learn how to select the best performing linear regression for univariate models (3)Click on the image for an interactive version

Use the Adjusted R2 for univariate models

If you only use one input variable, the adjusted R2 value gives you a good indication of how well your model performs. It illustrates how much variation is explained by your model.

In contrast to the simple R2, the adjusted R2 takes the number of input factors into account. It penalizes too many input factors and favors parsimonious models.

In the screenshot above, you can see two models with a value of 71.3 % and 84.32%. Apparently, the second model is better than the first one. Models with low values, however, can still be useful because the adjusted R2 is sensitive to the amount of noise in your data. As such, only compare this indicator of models for the same dataset than comparing it across different datasets.

Usually, there is little need for the SSE

Before you read on, let’s make sure we are talking about the same SSE. On Wikipedia, SSE refers to the sum of squared errors. In some statistic textbooks, however, SSE can refer to the explained sum of squares (the exact opposite). So for now, suppose SSE refers to the sum of squared errors.

Hence, the adjusted R2 is approximately 1 — SSE /SST. With SST referring to the total sum of squares.

I do not want to dive deeper into the math behind this. What I want to show you is that the adjusted R2 is computed with the SSE. So the SSE usually does not give you any additional information.

Furthermore, the adjusted R2 is normalized such that it is always between zero and one. So it is easier for you and others to interpret an unfamiliar model with an adjusted R2 of 75% rather than an SSE of 394 — even though both figures might explain the same model.

Have a look at the residuals or error terms!

What is often ignored are error terms or so-called residuals. They often tell you more than what you might think.

The residuals are the difference between your predicted values and the actual values.

Their benefit is that they can show you both the magnitude as well as the direction of your errors. Let’s have a look at an example:

Learn how to select the best performing linear regression for univariate models (4)We do not want residuals to vary like this around zero

Here, I tried to predict a polynomial dataset with a linear function. Analyzing the residuals shows that there are areas where the model has an upward or downward bias.

For 50 &l_t_; x < 100, the residuals are above zero. So in this area, the actual values have been higher than the predicted values — our model has a downward bias.

For100 < x &lt; 150, however, the residuals are below zero. Thus, the actual values have been lower than the predicted values — the model has an upward bias.

It is always good to know, whether your model suggests too high or too low values. But you usually do not want to have patterns like this.

The residuals should be zero on average (as indicated by the mean) and they should be equally distributed. Predicting the same dataset with a polynomial function of 3 degrees suggests a much better fit:

Learn how to select the best performing linear regression for univariate models (5)Here the residuals are equally distributed around zero. Suggesting a much better fit

In addition, you can observe whether the variance of your errors increases. In statistics, this is called Heteroscedasticity. You can fix this easily with robust standard errors. Otherwise, your hypothesis tests are likely to be wrong.

Histogram of residuals

Finally, the histogram summarizes the magnitude of your error terms. It provides information about the bandwidth of errors and indicates how often which errors occurred.

Learn how to select the best performing linear regression for univariate models (6)

Learn how to select the best performing linear regression for univariate models (7)The right histogram indicates a smaller bandwidth of errors than the left one. So it seems to be a better fit.

The above screenshots show two models for the same dataset. In the left histogram, errors occur within a range of -338 and 520.

In the right histogram, errors occur within -293 and 401. So the outliers are much lower. Furthermore, most errors in the model of the right histogram are closer to zero. So I would favor the right model.

Summary

When choosing a linear model, these are factors to keep in mind:

  • Only compare linear models for the same dataset.
  • Find a model with a high adjusted R2
  • Make sure this model has equally distributed residuals around zero
  • Make sure the errors of this model are within a small bandwidth

Learn how to select the best performing linear regression for univariate models (8)Click on the image to open the app

Learn how to select the best performing linear regression for univariate models (9)

If you have any questions, write a comment below or contact me. I appreciate your feedback.

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

ADVERTIsem*nT

Learn how to select the best performing linear regression for univariate models (10)
freeCodeCamp

Learn to code. Build projects. Earn certifications—All for free.

If you read this far, thank the author to show them you care.

Learn to code for free. freeCodeCamp's open source curriculum has helped more than 40,000 people get jobs as developers. Get started

ADVERTIsem*nT

Learn how to select the best performing linear regression for univariate models (2024)
Top Articles
5 Ways to Make Money From Home With No Experience (+ the resources to get started!)
Investing for Beginners 101: Easy Actionable Tips - Fun Cheap or Free
Encore Atlanta Cheer Competition
Somboun Asian Market
Mountain Dew Bennington Pontoon
Byrn Funeral Home Mayfield Kentucky Obituaries
Erskine Plus Portal
Clafi Arab
Https Www E Access Att Com Myworklife
Everything You Need to Know About Holly by Stephen King
United Dual Complete Providers
Gma Deals And Steals Today 2022
Samsung Galaxy S24 Ultra Negru dual-sim, 256 GB, 12 GB RAM - Telefon mobil la pret avantajos - Abonament - In rate | Digi Romania S.A.
24 Hour Walmart Detroit Mi
Puretalkusa.com/Amac
Commodore Beach Club Live Cam
Parent Resources - Padua Franciscan High School
Urban Airship Expands its Mobile Platform to Transform Customer Communications
Ukc Message Board
Zoe Mintz Adam Duritz
Energy Healing Conference Utah
O'Reilly Auto Parts - Mathis, TX - Nextdoor
Timeforce Choctaw
Talk To Me Showtimes Near Marcus Valley Grand Cinema
UMvC3 OTT: Welcome to 2013!
Loslaten met de Sedona methode
3Movierulz
Southwest Flight 238
This Is How We Roll (Remix) - Florida Georgia Line, Jason Derulo, Luke Bryan - NhacCuaTui
Jail Roster Independence Ks
Ezstub Cross Country
Christmas Days Away
Jt Closeout World Rushville Indiana
Att U Verse Outage Map
Nicole Wallace Mother Of Pearl Necklace
Tyler Sis 360 Boonville Mo
Western Gold Gateway
CVS Near Me | Somersworth, NH
Telegram update adds quote formatting and new linking options
Hannibal Mo Craigslist Pets
20 Best Things to Do in Thousand Oaks, CA - Travel Lens
Raising Canes Franchise Cost
Priscilla 2023 Showtimes Near Consolidated Theatres Ward With Titan Luxe
Riverton Wyoming Craigslist
Copd Active Learning Template
Bonecrusher Upgrade Rs3
Best Restaurant In Glendale Az
Kidcheck Login
Marion City Wide Garage Sale 2023
2121 Gateway Point
Equinox Great Neck Class Schedule
Latest Posts
Article information

Author: Sen. Ignacio Ratke

Last Updated:

Views: 6326

Rating: 4.6 / 5 (56 voted)

Reviews: 87% of readers found this page helpful

Author information

Name: Sen. Ignacio Ratke

Birthday: 1999-05-27

Address: Apt. 171 8116 Bailey Via, Roberthaven, GA 58289

Phone: +2585395768220

Job: Lead Liaison

Hobby: Lockpicking, LARPing, Lego building, Lapidary, Macrame, Book restoration, Bodybuilding

Introduction: My name is Sen. Ignacio Ratke, I am a adventurous, zealous, outstanding, agreeable, precious, excited, gifted person who loves writing and wants to share my knowledge and understanding with you.