Simple Linear Regression (2024)

What is simple linear regression?

Simple linear regression is used to model the relationship between two continuous variables. Often, the objective is to predict the value of an output variable (or response) based on the value of an input (or predictor) variable.

When to use regression

See how to perform simple linear regression using statistical software

Download JMP to follow along using the sample data included with the software.
To see more JMP tutorials, visit the JMP Learning Library.

We are often interested in understanding the relationship among several variables. Scatterplots and scatterplot matrices can be used to explore potential relationships between pairs of variables.Correlation provides a measure of the linear association between pairs of variables, but it doesn’t tell us about more complex relationships. For example, if the relationship is curvilinear, the correlation might be near zero.

You can use regression to develop a more formal understanding of relationships between variables.In regression, and in statistical modeling in general, we want to model the relationship between an output variable, or a response, and one or more input variables, or factors.

Depending on the context, output variables might also be referred to as dependent variables, outcomes, or simply Y variables, and input variables might be referred to as explanatory variables, effects, predictors or X variables.

We can use regression, and the results of regression modeling, to determine which variables have an effect on the response or help explain the response. This is known as explanatory modeling.

Linear regression example

Consider an example where we are interested in the cleaning of metal parts.

We have 50 parts with various inside diameters, outside diameters, and widths. Parts are cleaned using one of three container types. Cleanliness is a measure of the particulates on the parts. This is measured before and after running the parts through the cleaning process. The response of interest is Removal. This is the difference between pre-cleaning and post-cleaning measures.

We’re interested in whether the inside diameter, outside diameter, part width, and container type have an effect on the cleanliness, but we’re also interested in the nature of these effects. The relationship we develop linking the predictors to the response is a statistical model or, more specifically, a regression model.

The term regression describes a general collection of techniques used in modeling a response as a function of predictors. The only regression models that we'll consider in this discussion are linear models.

An example of a linear model for the cleaning data is shown below.

The simple linear regression model

In the example above, we collected data on 50 parts. We fit a regression model to predict Removal as a function of the OD of the parts. But what if we had sampled a different set of 50 parts and fit a regression line using these data? Would this produce the same regression equation? By fitting a regression line to observed data, we are trying to estimate the true, unknown relationship between the variables. This fitted regression equation is just one estimate of the true linear model. In reality, the true linear model is unknown.

In simple linear regression we assume that, for a fixed value of a predictor X, the mean of the response Y is a linear function of X. We denote this unknown linear function by the equation shown here where b₀ is the intercept and b₁ is the slope. The regression line we fit to data is an estimate of this unknown function.

The equation of the fitted line is denoted by the following equation:

Here, b₀ and b₁ are estimates of beta₀ and beta₁, respectively. The notation $ \hat{Y} $ (in this case, Y = Removal) indicates that the response is estimated from the data and that it is not an actual observation. In the cleaning example, the intercept, b₀, is 4.099 and the slope, b₁, is 0.528.

If we select a different sample of parts, our fitted line will be different. To illustrate, we use the Demonstrate Regression teaching module in the JMP sample scripts directory.

View Demonstration

Regression vs. ANOVA

Let’s compare regression and ANOVA. In simple linear regression, both the response and the predictor are continuous. In ANOVA, the response is continuous, but the predictor, or factor, is nominal. The results are related statistically. In both cases, we’re building a general linear model. But the goals of the analysis are different.

Regression gives us a statistical model that enables us to predict a response at different values of the predictor, including values of the predictor not included in the original data.

ANOVA measures the mean shift in the response for the different categories of the factor. As such, it's generally used to compare means for the different levels of the factor.

FAQs

Simple Linear Regression? ›

Definition. Simple linear regression aims to find a linear relationship to describe the correlation between an independent and possibly dependent variable. The regression line can be used to predict or estimate missing values, this is known as interpolation.

Read On ›

How do you explain linear regression in simple terms? ›

Linear regression is a data analysis technique that predicts the value of unknown data by using another related and known data value. It mathematically models the unknown or dependent variable and the known or independent variable as a linear equation.

Discover More Details ›

What is the difference between simple and multiple linear regression? ›

Simple linear regression has only one x and one y variable. Multiple linear regression has one y and two or more x variables.

How to interpret simple linear regression? ›

It is interpreted as the proportion of observed y variation that can be explained by the simple linear regression model (attributed to an approximate linear relationship between y and x). The higher the value of r2, the more successful is the simple linear regression model in explaining y variation.

See Details ›

Is simple linear regression hard? ›

Simplicity and interpretability: It's a relatively easy concept to understand and apply. The resulting simple linear regression model is a straightforward equation that shows how one variable affects another. This makes it easier to explain and trust the results compared to more complex models.

Find Out More ›

How to explain regression in layman terms? ›

Regression — as fancy as it sounds can be thought of as “relationship” between any two things. For example, imagine you stay on the ground and the temperature is 70°F. You start climbing a hill and as you climb, you realize that you are feeling colder and the temperature is dropping.

Tell Me More ›

What is an example of a simple linear regression? ›

We could use the equation to predict weight if we knew an individual's height. In this example, if an individual was 70 inches tall, we would predict his weight to be: Weight = 80 + 2 x (70) = 220 lbs. In this simple linear regression, we are examining the impact of one independent variable on the outcome.

Show Me More ›

When to use simple linear regression? ›

Simple linear regression is used to estimate the relationship between two quantitative variables. You can use simple linear regression when you want to know: How strong the relationship is between two variables (e.g., the relationship between rainfall and soil erosion).

Explore More ›

What is the difference between regression and simple linear regression? ›

Key Takeaways. Regression analysis is a common statistical method used in finance and investing. Linear regression (also called simple regression) is one of the most common techniques of regression analysis; in linear regression, there are only two variables: the independent variable and the dependent variable.

How do you calculate simple linear regression? ›

The formula for simple linear regression is Y = mX + b, where Y is the response (dependent) variable, X is the predictor (independent) variable, m is the estimated slope, and b is the estimated intercept.

Show Me More ›

What is an example of a linear regression in real life? ›

A simple linear regression real life example could mean you finding a relationship between the revenue and temperature, with a sample size for revenue as the dependent variable. In case of multiple variable regression, you can find the relationship between temperature, pricing and number of workers to the revenue.

Read The Full Story ›

How to report the results of a simple linear regression? ›

The report of the regression analysis should include the estimated effect of each explanatory variable – the regression slope or regression coefficient – with a 95% confidence interval, and a P-value. The P-value is for a test of the null hypothesis that the true regression coefficient is zero.

See Details ›

How do you explain linear regression to a child? ›

In more technical terms, we can say that linear regression helps us predict or estimate the value of one variable (like the crispiness of the bread) based on the value of another variable (such as the toasting time). This method is used to make informed predictions about one factor when we know the value of another.

Get More Info Here ›

What is the weakness of simple linear regression? ›

One of the main disadvantages of using linear regression for predictive analytics is that it is sensitive to outliers and noise. Outliers are data points that deviate significantly from the rest of the data, and noise is random variation or error in the data.

What is a linear regression in layman's terms? ›

In statistics, linear regression is a statistical model which estimates the linear relationship between a scalar response (dependent variable) and one or more explanatory variables (regressor or independent variable).

What are the disadvantages of regression? ›

Disadvantages of Regression Analysis

Multicollinearity: When independent variables are highly correlated, it becomes challenging to determine their impact on the dependent variable. Outliers and influential points: Extreme data points can disproportionately affect regression results, leading to inaccurate conclusions.

View Details ›