Simple Linear Regression (2024)

What is simple linear regression?

Simple linear regression is used to model the relationship between two continuous variables. Often, the objective is to predict the value of an output variable (or response) based on the value of an input (or predictor) variable.

When to use regression

See how to perform simple linear regression using statistical software

  • Download JMP to follow along using the sample data included with the software.
  • To see more JMP tutorials, visit the JMP Learning Library.

We are often interested in understanding the relationship among several variables. Scatterplots and scatterplot matrices can be used to explore potential relationships between pairs of variables.Correlation provides a measure of the linear association between pairs of variables, but it doesn’t tell us about more complex relationships. For example, if the relationship is curvilinear, the correlation might be near zero.

You can use regression to develop a more formal understanding of relationships between variables.In regression, and in statistical modeling in general, we want to model the relationship between an output variable, or a response, and one or more input variables, or factors.

Depending on the context, output variables might also be referred to as dependent variables, outcomes, or simply Y variables, and input variables might be referred to as explanatory variables, effects, predictors or X variables.

We can use regression, and the results of regression modeling, to determine which variables have an effect on the response or help explain the response. This is known as explanatory modeling.

We can also use regression to predict the values of a response variable based on the values of the important predictors. This is generally referred to as predictive modeling. Or, we can use regression models for optimization, to determine settings of factors to optimize a response. Our optimization goal might be to find settings that lead to a maximum response or to a minimum response. Or the goal might be to hit a target within an acceptable window.

For example, let’s say we’re trying to improve process yield.

  • We might use regression to determine which variables contribute to high yields,
  • We might be interested in predicting process yield for future production, given values of our predictors, or
  • We might want to identify factor settings that lead to optimal yields.

We might also use the knowledge gained through regression modeling to design an experiment that will refine our process knowledge and drive further improvement.

Linear regression example

Consider an example where we are interested in the cleaning of metal parts.

We have 50 parts with various inside diameters, outside diameters, and widths. Parts are cleaned using one of three container types. Cleanliness is a measure of the particulates on the parts. This is measured before and after running the parts through the cleaning process. The response of interest is Removal. This is the difference between pre-cleaning and post-cleaning measures.

We’re interested in whether the inside diameter, outside diameter, part width, and container type have an effect on the cleanliness, but we’re also interested in the nature of these effects. The relationship we develop linking the predictors to the response is a statistical model or, more specifically, a regression model.

The term regression describes a general collection of techniques used in modeling a response as a function of predictors. The only regression models that we'll consider in this discussion are linear models.

An example of a linear model for the cleaning data is shown below.

In this model, if the outside diameter increases by 1 unit, with the width remaining fixed, the removal increases by 1.2 units. Likewise, if the part width increases by 1 unit, with the outside diameter remaining fixed, the removal increases by 0.2 units. This model enables us to predict removal for parts with given outside diameters and widths.

For example, the predicted removal for parts with an outside diameter of 5 and a width of 3 is 16.6 units. In this example, we have two continuous predictors. When more than one predictor is used, the procedure is called multiple linear regression.

When only one continuous predictor is used, we refer to the modeling procedure as simple linear regression. For the remainder of this discussion, we'll focus on simple linear regression.

A scatterplot indicates that there is a fairly strong positive relationship between Removal and OD (the outside diameter). To understand whether OD can be used to predict or estimate Removal, we fit a regression line. The fitted line estimates the mean of Removal for a given fixed value of OD. The value 4.099 is the intercept and 0.528 is the slope coefficient. The intercept, which is used to anchor the line, estimates Removal when the outside diameter is zero. Because diameter can’t be zero, the intercept isn’t of direct interest.

The slope coefficient estimates the average increase in Removal for a 1-unit increase in outside diameter. That is, for every 1-unit increase in outside diameter, Removal increases by 0.528 units on average.

The simple linear regression model

In the example above, we collected data on 50 parts. We fit a regression model to predict Removal as a function of the OD of the parts. But what if we had sampled a different set of 50 parts and fit a regression line using these data? Would this produce the same regression equation? By fitting a regression line to observed data, we are trying to estimate the true, unknown relationship between the variables. This fitted regression equation is just one estimate of the true linear model. In reality, the true linear model is unknown.

In simple linear regression we assume that, for a fixed value of a predictor X, the mean of the response Y is a linear function of X. We denote this unknown linear function by the equation shown here where b0 is the intercept and b1 is the slope. The regression line we fit to data is an estimate of this unknown function.

The equation of the fitted line is denoted by the following equation:

Here, b0 and b1 are estimates of beta0 and beta1, respectively. The notation $ \hat{Y} $ (in this case, Y = Removal) indicates that the response is estimated from the data and that it is not an actual observation. In the cleaning example, the intercept, b0, is 4.099 and the slope, b1, is 0.528.

If we select a different sample of parts, our fitted line will be different. To illustrate, we use the Demonstrate Regression teaching module in the JMP sample scripts directory.

Regression vs. ANOVA

Let’s compare regression and ANOVA. In simple linear regression, both the response and the predictor are continuous. In ANOVA, the response is continuous, but the predictor, or factor, is nominal. The results are related statistically. In both cases, we’re building a general linear model. But the goals of the analysis are different.

Regression gives us a statistical model that enables us to predict a response at different values of the predictor, including values of the predictor not included in the original data.

ANOVA measures the mean shift in the response for the different categories of the factor. As such, it's generally used to compare means for the different levels of the factor.

Simple Linear Regression (2024)

FAQs

Simple Linear Regression? ›

Definition. Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable. The regression line can be used to predict or estimate missing values, this is known as interpolation.

How do you explain linear regression in simple terms? ›

Linear regression is a data analysis technique that predicts the value of unknown data by using another related and known data value. It mathematically models the unknown or dependent variable and the known or independent variable as a linear equation.

What is the difference between simple and multiple linear regression? ›

Simple linear regression has only one x and one y variable. Multiple linear regression has one y and two or more x variables.

How to interpret simple linear regression? ›

It is interpreted as the proportion of observed y variation that can be explained by the simple linear regression model (attributed to an approximate linear relationship between y and x). The higher the value of r2, the more successful is the simple linear regression model in explaining y variation.

Is simple linear regression hard? ›

Simplicity and interpretability: It's a relatively easy concept to understand and apply. The resulting simple linear regression model is a straightforward equation that shows how one variable affects another. This makes it easier to explain and trust the results compared to more complex models.

How to explain regression in layman terms? ›

Regression — as fancy as it sounds can be thought of as “relationship” between any two things. For example, imagine you stay on the ground and the temperature is 70°F. You start climbing a hill and as you climb, you realize that you are feeling colder and the temperature is dropping.

What is an example of a simple linear regression? ›

We could use the equation to predict weight if we knew an individual's height. In this example, if an individual was 70 inches tall, we would predict his weight to be: Weight = 80 + 2 x (70) = 220 lbs. In this simple linear regression, we are examining the impact of one independent variable on the outcome.

When to use simple linear regression? ›

Simple linear regression is used to estimate the relationship between two quantitative variables. You can use simple linear regression when you want to know: How strong the relationship is between two variables (e.g., the relationship between rainfall and soil erosion).

What is the difference between regression and simple linear regression? ›

Key Takeaways. Regression analysis is a common statistical method used in finance and investing. Linear regression (also called simple regression) is one of the most common techniques of regression analysis; in linear regression, there are only two variables: the independent variable and the dependent variable.

How do you calculate simple linear regression? ›

The formula for simple linear regression is Y = mX + b, where Y is the response (dependent) variable, X is the predictor (independent) variable, m is the estimated slope, and b is the estimated intercept.

What is an example of a linear regression in real life? ›

A simple linear regression real life example could mean you finding a relationship between the revenue and temperature, with a sample size for revenue as the dependent variable. In case of multiple variable regression, you can find the relationship between temperature, pricing and number of workers to the revenue.

How to report the results of a simple linear regression? ›

The report of the regression analysis should include the estimated effect of each explanatory variable – the regression slope or regression coefficient – with a 95% confidence interval, and a P-value. The P-value is for a test of the null hypothesis that the true regression coefficient is zero.

How do you explain linear regression to a child? ›

In more technical terms, we can say that linear regression helps us predict or estimate the value of one variable (like the crispiness of the bread) based on the value of another variable (such as the toasting time). This method is used to make informed predictions about one factor when we know the value of another.

What is the weakness of simple linear regression? ›

One of the main disadvantages of using linear regression for predictive analytics is that it is sensitive to outliers and noise. Outliers are data points that deviate significantly from the rest of the data, and noise is random variation or error in the data.

What is a linear regression in layman's terms? ›

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable).

What are the disadvantages of regression? ›

Disadvantages of Regression Analysis

Multicollinearity: When independent variables are highly correlated, it becomes challenging to determine their impact on the dependent variable. Outliers and influential points: Extreme data points can disproportionately affect regression results, leading to inaccurate conclusions.

Top Articles
Permanent residency in Belgium
4 Reasons Why Student Housing Is a Wise Investment
Craigslist San Francisco Bay
#ridwork guides | fountainpenguin
Promotional Code For Spades Royale
1970 Chevelle Ss For Sale Craigslist
Jefferey Dahmer Autopsy Photos
Southside Grill Schuylkill Haven Pa
Koordinaten w43/b14 mit Umrechner in alle Koordinatensysteme
Parks in Wien gesperrt
The Haunted Drury Hotels of San Antonio’s Riverwalk
Tugboat Information
Wunderground Huntington Beach
Vcuapi
Mile Split Fl
SXSW Film & TV Alumni Releases – July & August 2024
Extra Virgin Coconut Oil Walmart
Harem In Another World F95
Craighead County Sheriff's Department
How Much Is Tay Ks Bail
Mccain Agportal
Reptile Expo Fayetteville Nc
Unionjobsclearinghouse
Routing Number For Radiant Credit Union
Nesb Routing Number
Vivaciousveteran
D2L Brightspace Clc
Jesus Revolution Showtimes Near Regal Stonecrest
Kirk Franklin Mother Debra Jones Age
Farm Equipment Innovations
Pokemon Inflamed Red Cheats
Neteller Kasiinod
Productos para el Cuidado del Cabello Después de un Alisado: Tips y Consejos
The value of R in SI units is _____?
Luciipurrrr_
Shnvme Com
One Credit Songs On Touchtunes 2022
About Us | SEIL
What Are Digital Kitchens & How Can They Work for Foodservice
Craigs List Jonesboro Ar
Überblick zum Barotrauma - Überblick zum Barotrauma - MSD Manual Profi-Ausgabe
craigslist: modesto jobs, apartments, for sale, services, community, and events
Pa Legion Baseball
Grizzly Expiration Date Chart 2023
Is Ameriprise A Pyramid Scheme
Brother Bear Tattoo Ideas
Huntsville Body Rubs
Lesly Center Tiraj Rapid
Cara Corcione Obituary
San Diego Padres Box Scores
Strawberry Lake Nd Cabins For Sale
Hy-Vee, Inc. hiring Market Grille Express Assistant Department Manager in New Hope, MN | LinkedIn
Latest Posts
Article information

Author: Jeremiah Abshire

Last Updated:

Views: 5951

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.