Regression Analysis | Real Statistics Using Excel (2024)

The goal of regression analysis is to describe the relationship between two variables based on observed data and to predict the value of the dependent variable based on the value of the independent variable. Even though we can make such predictions, this doesn’t imply that we can claim any causal relationship between the independent and dependent variables.

Definition 1:If y is a dependent variable and x is an independent variable, then the linear regression model provides a prediction of y from x of the form

where α + βx is the deterministic portion of the model and ε is the random error. We further assume that for any given value of x the random error ε is normally and independently distributed with mean zero.

Observation: In practice, we will build the linear regression model from the sample data using the least-squares method. Thus we seek coefficients a and b such that

For the data in our sample we will have

where ŷi is the y value predicted by the model at xi. Thus the error term for the model is given by

Example 1:For each x value in the sample data from Example 1 of One Sample Hypothesis Testing for Correlation, find the predicted valueŷcorresponding to x, i.e. the value of y on the regression line corresponding to x. Also find the predicted life expectancy of men who smoke 4, 24 and 44 cigarettes based on the regression model.

Figure 1 – Obtaining predicted values for data in Example 1

The predicted values can be obtained using the fact that for any i, the point (xi, ŷi) lies on the regression line and so ŷi= a + bxi. E.g. cell K5 in Figure 1 contains the formula =I5*E4+E5, where I5 contains the first x value 5, E4 contains the slope b and E5 contains the y-intercept (referring to the worksheet in Figure 1 of Method of Least Squares). Alternatively, this value can be obtained by using the formula =FORECAST(I5,J5:J19, I5:I19). In fact, the predicted y values can be obtained, as a single unit, by using the array formula TREND. This is done by highlighting the range K5:K19 and entering the array formula =TREND(J5:J19, I5:I19) followed by pressing Ctrl-Shft-Enter.

The predicted values for x = 4, 24 and 44 can be obtained in a similar manner using any of the three methods defined above. The second form of the TREND formula can be used. E.g. to obtain the predicted values of 4, 24 and 44 (stored in N19:N21), highlight range O19:O21, enter the array formula =TREND(J5:J19,I5:I19,N19:N21) and then press Ctrl-Shft-Enter. Note that these approaches yield predicted values even for values of x that are not in the sample (such as 24 and 44). The predicted life expectancy for men who smoke 4, 24 and 44 cigarettes is 83.2, 70.6 and 58.1 years respectively.

Definition 2: We use the following terminology:

The Residual is the error term of Definition 1. We also define the degrees of freedom dfT, dfReg, dfRes, the sum of squares SST,SSReg,SSResand the mean squares MST, MSReg, MSResas follows:

Property 1:

Observation: SST is the total variability of y (e.g. the variability of life expectancy in Example 1 of One Sample Hypothesis Testing for Correlation). SSReg represents the variability of y that can be explained by the regression model (i.e. the variability in life expectancy that can be explained by the number of cigarettes smoked), and so by Property 1,SSRes expresses the variability of y that can’t be explained by the regression model.

Thus SSReg/SSTrepresents the percentage of the variability of y that can be explained by the regression model. It turns out that this is equal to the coefficient of determination.

Property 2:

Property 3:

Observation:Note that for a sample size of 100, a correlation coefficient as low as .197 will result in the null hypothesis that the population correlation coefficient is 0 being rejected (per Theorem 1 of One Sample Hypothesis Testing for Correlation). But when the correlation coefficient r = .197, then r2= .039, which means that model variance SSRegis less than 4% of the total variance SSTwhich is quite a small association indeed. Whereas this effect is “significant”, it certainly isn’t very “large”.

Observation: From Property 2, we see that the coefficient of determination r2is a measure of the accuracy of the predication of the linear regression model. r2has a value between 0 and 1, with 1 indicating a perfect fit between the linear regression model and the data.

Property 4:

Definition 3: Thestandard error of the estimate is defined as

Observation: The second assertion in Property 4 can be restated as

For large samples Regression Analysis | Real Statistics Using Excel (17) ≈ 1and so

Note that if r = .5, then

which indicates that the standard error of the estimate is still 86.6% of the standard error that doesn’t factor in any information about x; i.e. having information about x only reduces the error by 13.4%. Even if r = .9, then sy.x = .436·sy, which indicates that information about x reduces the standard error (with no information about x) by only a little over 50%.

Property 5:

a) The sums of the y values is equal to the sum of theŷvalues; i.e. Regression Analysis | Real Statistics Using Excel (20) =Regression Analysis | Real Statistics Using Excel (21)

b) The mean of the yvalues and ŷvalues are equal; i.e. ȳ =the mean of the ŷi

c) The sums of the error terms is 0; i.e. Regression Analysis | Real Statistics Using Excel (22) = 0

d) The correlation coefficient of x with ŷis sign(b); i.e. rxŷ= sign(rxy)

e) The correlation coefficient of y with ŷis the absolute value of the correlation coefficient of x with y; i.e.Regression Analysis | Real Statistics Using Excel (23) = |Regression Analysis | Real Statistics Using Excel (24)|

f) The coefficient of determination of y with ŷ is the same as the correlation coefficient of x with y; i.e.Regression Analysis | Real Statistics Using Excel (25) = Regression Analysis | Real Statistics Using Excel (26)

Observation: Clickhere for the proofs of the various properties described above.

Regression Analysis | Real Statistics Using Excel (2024)

FAQs

Regression Analysis | Real Statistics Using Excel? ›

Click on the “Data” menu, and then choose the “Data Analysis” tab. You will now see a window listing the various statistical tests that Excel can perform. Scroll down to find the regression option and click “o*k”. Now input the cells containing your data.

How do you do regression statistics in Excel? ›

Click on the “Data” menu, and then choose the “Data Analysis” tab. You will now see a window listing the various statistical tests that Excel can perform. Scroll down to find the regression option and click “o*k”. Now input the cells containing your data.

How to do ANOVA regression in Excel? ›

How to use two-way ANOVA in Excel
  1. Click the Data tab.
  2. Click Data Analysis.
  3. Select Anova: Two Factor with Replication and click OK.
  4. Next to Input Range, click the up arrow.
  5. Select the data and click the down arrow.
  6. In Rows per sample, enter the number of measurements in the group, then click OK to run.

How to use the regression equation to make predictions in Excel? ›

From the menu, select "Regression" and click "OK". In the Regression dialog box, click the "Input Y Range" box and select the dependent variable data (Visa (V) stock returns). Click the "Input X Range" box and select the independent variable data (S&P 500 returns). Click "OK" to run the results.

Can Excel do multivariate regression? ›

Excel has a built-in data analysis tool that you can use to conduct multivariate regression analysis. To access this tool, you first need to enable it. To do this, go to the “Tools” menu, select “Excel Add-ins” and Check the “Analysis ToolPak” box and click “o*k.”

What is the formula for calculating regression in statistics? ›

The formula for simple linear regression is Y = mX + b, where Y is the response (dependent) variable, X is the predictor (independent) variable, m is the estimated slope, and b is the estimated intercept.

What does a regression analysis tell you? ›

Typically, a regression analysis is done for one of two purposes: In order to predict the value of the dependent variable for individuals for whom some information concerning the explanatory variables is available, or in order to estimate the effect of some explanatory variable on the dependent variable.

Can Excel do predictive analysis? ›

A. Yes, predictive modeling can be performed in Excel using tools like regression analysis, trendline fitting, and predictive functions.

What is the formula for regression analysis in forecasting? ›

PROCEDURE: The simplest regression analysis models the relationship between two variables uisng the following equation: Y = a + bX, where Y is the dependent variable and X is the independent variable. Notice that this simple equation denotes a "linear" relationship between X and Y.

How to do predictive analysis through regression? ›

Comparison of the SEE for different models using the same sample allows for determination of the most accurate model to use for prediction. SEE % is calculated by dividing the SEE by the mean of the criterion (SEE/mean criterion) and can be used to compare different models derived from different samples.

When not to use multivariate regression? ›

Multivariate regression analysis is not recommended for small samples. The outcome variables should be at least moderately correlated for the multivariate regression analysis to make sense.

How to interpret regression statistics? ›

Interpreting Linear Regression Coefficients

A positive coefficient indicates that as the value of the independent variable increases, the mean of the dependent variable also tends to increase. A negative coefficient suggests that as the independent variable increases, the dependent variable tends to decrease.

When should I use multivariate regression? ›

Multivariate regression is an extension of simple linear regression. It is used when we want to predict the value of a variable based on the value of two or more different variables.

How does Excel calculate regression line? ›

In regression analysis, Excel calculates for each point the squared difference between the y-value estimated for that point and its actual y-value. The sum of these squared differences is called the residual sum of squares, ssresid. Excel then calculates the total sum of squares, sstotal.

How do you do data regression? ›

It consists of 3 stages – (1) analyzing the correlation and directionality of the data, (2) estimating the model, i.e., fitting the line, and (3) evaluating the validity and usefulness of the model. First, a scatter plot should be used to analyze the data and check for directionality and correlation of data.

How to calculate R2 in Excel? ›

You can use the RSQ() function to calculate R² in Excel. If your dependent variable is in column A and your independent variable is in column B, then click any blank cell and type “RSQ(A:A,B:B)”.

How to do statistics in Excel? ›

To calculate descriptive statistics for a column of data, click on the Data ribbon. Click on Data Analysis in the Analysis section. Select Descriptive Statistics, then click OK. Click on the Input Range selection button, then select the range of cells for the column.

Top Articles
Careers adviser job profile | Prospects.ac.uk
Five ways to use positive behaviour support strategies in your classroom
Walgreens Harry Edgemoor
Kathleen Hixson Leaked
Star Sessions Imx
Danatar Gym
30 Insanely Useful Websites You Probably Don't Know About
Overnight Cleaner Jobs
Linkvertise Bypass 2023
Wfin Local News
Aries Auhsd
Ktbs Payroll Login
Unit 1 Lesson 5 Practice Problems Answer Key
Explore Top Free Tattoo Fonts: Style Your Ink Perfectly! 🖌️
Blog:Vyond-styled rants -- List of nicknames (blog edition) (TouhouWonder version)
Cvs Appointment For Booster Shot
The Exorcist: Believer (2023) Showtimes
Kamzz Llc
Christina Steele And Nathaniel Hadley Novel
Rs3 Eldritch Crossbow
Theater X Orange Heights Florida
12 Facts About John J. McCloy: The 20th Century’s Most Powerful American?
Elbert County Swap Shop
Hellraiser 3 Parents Guide
Timeline of the September 11 Attacks
1773x / >
Malluvilla In Malayalam Movies Download
13301 South Orange Blossom Trail
Ts Modesto
Meowiarty Puzzle
Wells Fargo Bank Florida Locations
Pixel Combat Unblocked
Kaiserhrconnect
Adecco Check Stubs
Shnvme Com
Goodwill Houston Select Stores Photos
T&J Agnes Theaters
Police Academy Butler Tech
2008 Chevrolet Corvette for sale - Houston, TX - craigslist
Natashas Bedroom - Slave Commands
Vivek Flowers Chantilly
Author's Purpose And Viewpoint In The Dark Game Part 3
Seminary.churchofjesuschrist.org
Tripadvisor Vancouver Restaurants
Tunica Inmate Roster Release
Thor Majestic 23A Floor Plan
Dwc Qme Database
Shell Gas Stations Prices
Embry Riddle Prescott Academic Calendar
Congruent Triangles Coloring Activity Dinosaur Answer Key
Overstock Comenity Login
Latest Posts
Article information

Author: Domingo Moore

Last Updated:

Views: 6104

Rating: 4.2 / 5 (53 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Domingo Moore

Birthday: 1997-05-20

Address: 6485 Kohler Route, Antonioton, VT 77375-0299

Phone: +3213869077934

Job: Sales Analyst

Hobby: Kayaking, Roller skating, Cabaret, Rugby, Homebrewing, Creative writing, amateur radio

Introduction: My name is Domingo Moore, I am a attractive, gorgeous, funny, jolly, spotless, nice, fantastic person who loves writing and wants to share my knowledge and understanding with you.