Correlation (2024)

What is correlation?

Correlation is a statistical measure that expresses the extent to which two variables are linearly related (meaning they change together at a constant rate). It’s a common tool for describing simple relationships without making a statement about cause and effect.

How is correlation measured?

The sample correlation coefficient, r, quantifies the strength of the relationship. Correlations are also tested for statistical significance.

What are some limitations of correlation analysis?

Correlation can’t look at the presence or effect of other variables outside of the two being explored. Importantly, correlation doesn’t tell us about cause and effect. Correlation also cannot accurately describe curvilinear relationships.

Correlations describe data moving together

Correlations are useful for describing simple relationships among data. For example, imagine that you are looking at a dataset of campsites in a mountain park. You want to know whether there is a relationship between the elevation of the campsite (how high up the mountain it is), and the average high temperature in the summer.

For each individual campsite, you have two measures: elevation and temperature. When you compare these two variables across your sample with a correlation, you can find a linear relationship: as elevation increases, the temperature drops. They are negatively correlated.

What do correlation numbers mean?

We describe correlations with a unit-free measure called the correlation coefficient which ranges from -1 to +1 and is denoted by r. Statistical significance is indicated with a p-value. Therefore, correlations are typically written with two key numbers: r =and p = .

  • The closer r is to zero, the weaker the linear relationship.
  • Positive r values indicate a positive correlation, where the values of both variables tend to increase together.
  • Negative r values indicate a negative correlation, where the values of one variable tend to increase when the values of the other variable decrease.
  • The p-value gives us evidence that we can meaningfully conclude that the population correlation coefficient is likely different from zero, based on what we observe from the sample.
  • "Unit-free measure" means that correlations exist on their own scale: in our example, the number given for r is not on the same scale as either elevation or temperature. This is different from other summary statistics. For instance, the mean of the elevation measurements is on the same scale as its variable.

What is a p-value?

A p-value is a measure of probability used for hypothesis testing.

It is the probability of obtaining test results equal to or more extreme than what was observed, assuming that no effect is actually present – in other words, assuming that the null hypothesis is true. For our campsite data, the null hypothesis is that there is no linear relationship between elevation and temperature. A small p-value suggests that the observed data is unlikely under the null hypothesis. When a p-value is used to describe a result as statistically significant, this means that it falls below a pre-defined cutoff (e.g., p <.05 or p <.01) at which point we reject the null hypothesis in favor of an alternative hypothesis (for our campsite data, that thereisa relationship between elevation and temperature).

Once we’ve obtained a significant correlation, we can also look at its strength. A perfect positive correlation has a value of 1, and a perfect negative correlation has a value of -1. But in the real world, we would never expect to see a perfect correlation unless one variable is actually a proxy measure for the other. In fact, seeing a perfect correlation number can alert you to an error in your data! For example, if you accidentally recorded distance from sea level for each campsite instead of temperature, this would correlate perfectly with elevation.

Another useful piece of information is the N, or number of observations. As with most statistical tests, knowing the size of the sample helps us judge the strength of our sample and how well it represents the population. For example, if we only measured elevation and temperature for five campsites, but the park has two thousand campsites, we’d want to add more campsites to our sample.

Visualizing correlations with scatterplots

Back to our example from above: as campsite elevation increases, temperature drops. We can look at this directly with a scatterplot. Imagine that we’ve plotted our campsite data:

  • Each point in the plot represents one campsite, which we can place on an x- and y-axis by its elevation and summertime high temperature.
  • The correlation coefficient (r) also illustrates our scatterplot. It tells us, in numerical terms, how close the points mapped in the scatterplot come to a linear relationship. Stronger relationships, or bigger r values, mean relationships where the points are very close to the line which we’ve fit to the data.

What about more complex relationships?

Scatterplots are also useful for determining whether there is anything in our data that might disrupt an accurate correlation, such as unusual patterns like a curvilinear relationship or an extreme outlier.

Correlations can’t accurately capture curvilinear relationships. In a curvilinear relationship, variables are correlated in a given direction until a certain point, where the relationship changes.

For example, imagine that we looked at our campsite elevations and how highly campers rate each campsite, on average. Perhaps at first, elevation and campsite ranking are positively correlated, because higher campsites get better views of the park. But at a certain point, higher elevations become negatively correlated with campsite rankings, because campers feel cold at night!

We can get even more insight by adding shaded density ellipses to our scatterplot. A density ellipse illustrates the densest region of the points in a scatterplot, which in turn helps us see the strength and direction of the correlation.

Density ellipses can be various sizes. One common choice for examining correlation is a 95% density ellipse, which captures approximately the densest 95% of the observations. If two variables are moving together, like our campsites’ elevation and temperature, we would expect to see this density ellipse mirror the shape of the line. And we can see that in a curvilinear relationship, the density ellipse looks round: a correlation won’t give us a meaningful description of this relationship.

Correlation (2024)
Top Articles
How to Handle Overpayments in QuickBooks Online - dummies
Policy Cancelation Request Form
Ach Credit Ftb Mct
FREE Houses! All You Have to Do Is Move Them. - CIRCA Old Houses
12 Prachtige Sauna's in Brabant Waar Je Moet Zijn
Nambe Flatware Discontinued
Golden Grain Pizza East Greenbush - Rensselaer
Walmart Academy Core Test Questions And Answers
Smart Buy Liquidation Outlet Airline Hwy
Okta Nhrmc
Netgear Outage
Babylon Showtimes Near Cinema Cafe - Kemps River
Itshayss
247 Cincinnati
S&P 500 Hits Record High Buoyed by Economic Hopes: Markets Wrap
Skip The Games Maui
Dickdrainersx Jessica Marie
IFA - The REACH Chemical Regulation and OS&H: Classification and labelling inventory
High School Musical Star Sanborn Daily Themed Crossword
Walmart Light Fixtures
Un-Pc Purchase Crossword Clue
10 Facts You Never Knew about Gene Rayburn
Amp Spa Reviews Nyc
German American Bank Owenton Ky
Dreammarriage.com Login
Skip The Games Wilkes-Barre Pa
Newcardapply.com/21978
Holliston Unleashed: Your Ultimate Guide to 25 Exciting Adventures - Thebostondaybook.com
Larry A.k.a Lvrd Pharaoh
Sams Gas Price San Bernardino
Qeuter
Burlington Antioch Ca
Raley Scrubs - Midtown
Layton Parkway Instacare Photos
Top 15 Easy Cold Appetizers
Talx Paperless Pay Shell
Craigslist Com San Luis Obispo
Villeroy & Boch WC für Kombination vita O.novo, 4620R001, B: 360, T: 710 mm, Weiß Alpin
Craigslist Marion Ma
Tetroid Addicting Games
Magicseaweed Encinitas
Apple iPhone 16 Plus 128GB Negro
Craigslist Wilmington Nc Free Stuff
268000 Yen To Usd
Www Publix Org Oasis Schedule
Weather Past 3 Days
Soapzone Gh Boards
Ozembique
Weitere relevante internationale Abkommen und Vereinbarungen
Fall River Ma Apartments For Rent Craigslist
Craigslist Pet Phoenix
Cdimeters
Latest Posts
Article information

Author: Errol Quitzon

Last Updated:

Views: 5860

Rating: 4.9 / 5 (79 voted)

Reviews: 86% of readers found this page helpful

Author information

Name: Errol Quitzon

Birthday: 1993-04-02

Address: 70604 Haley Lane, Port Weldonside, TN 99233-0942

Phone: +9665282866296

Job: Product Retail Agent

Hobby: Computer programming, Horseback riding, Hooping, Dance, Ice skating, Backpacking, Rafting

Introduction: My name is Errol Quitzon, I am a fair, cute, fancy, clean, attractive, sparkling, kind person who loves writing and wants to share my knowledge and understanding with you.