Linear Regression - Data Science Discovery (2024)

Linear Regression

The idea of trying to fit a line as closely as possible to as many points as possible is known as linear regression. The most common technique is to try to fit a line that minimizes the squared distance to each of those points. This is called OLS or Ordinary Least Squares Regression.

We can find the equation of this line and use it to make predictions. Since our regression estimates form a straight line, we can describe them using an equation in slope-intercept form:

Regression Equation

Linear Regression - Data Science Discovery (1)

When we have one x-variable (x1) and one y-variable (y-hat), this is called simple linear regression. This means that we are using one independent variable to predict the y-variable. We can have multiple independent variables to predict the y-variable and this is called multiple regression. For now, we are going to focus on simple linear regression because it's easy to interpret the results.

The Slope and Y Intercept of the Regression Line

In our regression equation, b0 is the y-intercept and b1 is the slope. Here's how you calculate the slope and y-intercept:

Linear Regression - Data Science Discovery (2)

Here's how you interpret them:

  • SLOPE= The average increase in Y associated with a 1-unit increase in X.
  • Y-INTERCEPT= The predicted value of Y when X is equal to 0.

In order to make predictions using the equation of the regression line, first find the slope and y-intercept. Next, you can plug in values of x to get predicted values of y.

Warning About Regression

When making predictions using regression, it's important to be aware of the following:

  • Predicting y at values of x beyond the range of x in the data is called extrapolation.
  • This is risky because we have no evidence to believe that the association between x and y remains linear for unseen values of x.
  • Extrapolated predictions can be absolutely wrong.

Residuals and RMSE

Unless there is a perfect correlation, our predictions are not going to be perfect. When thinking about this graphically, this means that for most of the points in any scatter plot, the actual y-values and the predicted y-values are different. The distance between the actual value and the predicted value from the line is called the residual or prediction error.

The residual is calculated by taking the actual value of y - the predicted value of y.

The residuals are the vertical distances between the points and the line.

  • If the point is above the regression line, the residual is positive.
  • If the point is below the regression line, the residual is negative.
  • If the point is exactly on the regression line, the residual is 0.

Two Key Features of the Regression Line:

  1. For any regression line, the average (and the sum) of the errors is always zero because the positives and negatives cancel out.
  2. The SD of the errors (also called the Root Mean Square Error or RMSE), is a measure of the typical spread of the data around the regression line.

RMSE=SDerrors: The SD of the prediction errors is a measure of how accurate our predictions are. The better the predictions, the smaller the size of the errors and the smaller the RMSE.

Rather than finding all the errors and then taking their root mean square, it's much easier to use this formula below. The RMSE is in the same units as your y variable.

Linear Regression - Data Science Discovery (3)

Video 1: Simple Linear Regression

Follow along with the worksheet to work through the problem:

Video 2: Residuals and RMSE

Follow along with the worksheet to work through the problem:

Q1: Which one is better?

Q2: Linear Regression - Data Science Discovery (4)
What is the Y-INTERCEPT for the given straight line?

Q3: Suppose we have clinical data for 400 patients and the task is to predict if a patient has cancer from the given data. Should we use linear regression in this situation?

`); } else { $e.prop("disabled", true); $e.html((i, html) => "❌ " + html); $e.after(`

Try Again. ${d.comment}

Linear Regression - Data Science Discovery (2024)
Top Articles
Simplified Permanent Portfolio: ETF allocation and returns
How much professional indemnity do I need? Hiscox UK
Pet For Sale Craigslist
Devon Lannigan Obituary
Regal Amc Near Me
Nfr Daysheet
Top 10: Die besten italienischen Restaurants in Wien - Falstaff
The Potter Enterprise from Coudersport, Pennsylvania
5 Bijwerkingen van zwemmen in een zwembad met te veel chloor - Bereik uw gezondheidsdoelen met praktische hulpmiddelen voor eten en fitness, deskundige bronnen en een betrokken gemeenschap.
Gw2 Legendary Amulet
Comenity Credit Card Guide 2024: Things To Know And Alternatives
DIN 41612 - FCI - PDF Catalogs | Technical Documentation
Https //Advanceautoparts.4Myrebate.com
Connexus Outage Map
Overton Funeral Home Waterloo Iowa
Money blog: Domino's withdraws popular dips; 'we got our dream £30k kitchen for £1,000'
Byte Delta Dental
Nashville Predators Wiki
Procore Championship 2024 - PGA TOUR Golf Leaderboard | ESPN
Craigslist Free Stuff Greensboro Nc
Palm Coast Permits Online
Why Is 365 Market Troy Mi On My Bank Statement
China’s UberEats - Meituan Dianping, Abandons Bike Sharing And Ride Hailing - Digital Crew
Rural King Credit Card Minimum Credit Score
Viha Email Login
Team C Lakewood
Tips and Walkthrough: Candy Crush Level 9795
Celina Powell Lil Meech Video: A Controversial Encounter Shakes Social Media - Video Reddit Trend
Cognitive Science Cornell
Tuw Academic Calendar
Rainfall Map Oklahoma
Ryujinx Firmware 15
Haunted Mansion Showtimes Near Cinemark Tinseltown Usa And Imax
Vistatech Quadcopter Drone With Camera Reviews
Kaiju Paradise Crafting Recipes
Chase Bank Cerca De Mí
Exploring TrippleThePotatoes: A Popular Game - Unblocked Hub
Tgh Imaging Powered By Tower Wesley Chapel Photos
oklahoma city community "puppies" - craigslist
Austin Automotive Buda
Manatee County Recorder Of Deeds
Planet Fitness Lebanon Nh
Raising Canes Franchise Cost
Mvnt Merchant Services
Dr Adj Redist Cadv Prin Amex Charge
Indio Mall Eye Doctor
888-822-3743
Craigslist Antique
Deezy Jamaican Food
Santa Ana Immigration Court Webex
Tweedehands camper te koop - camper occasion kopen
Fishing Hook Memorial Tattoo
Latest Posts
Article information

Author: Kerri Lueilwitz

Last Updated:

Views: 6160

Rating: 4.7 / 5 (67 voted)

Reviews: 82% of readers found this page helpful

Author information

Name: Kerri Lueilwitz

Birthday: 1992-10-31

Address: Suite 878 3699 Chantelle Roads, Colebury, NC 68599

Phone: +6111989609516

Job: Chief Farming Manager

Hobby: Mycology, Stone skipping, Dowsing, Whittling, Taxidermy, Sand art, Roller skating

Introduction: My name is Kerri Lueilwitz, I am a courageous, gentle, quaint, thankful, outstanding, brave, vast person who loves writing and wants to share my knowledge and understanding with you.