Hypothesis Testing, P Values, Confidence Intervals, and Significance (2024)

Definition/Introduction

Medical providers often rely on evidence-based medicine to guide decision-making in practice. Often a research hypothesis is tested with results provided, typically with p values, confidence intervals, or both. Additionally, statistical or research significance is estimated or determined by the investigators. Unfortunately, healthcare providers may have different comfort levels in interpreting these findings, which may affect the adequate application of the data.

Issues of Concern

Without a foundational understanding of hypothesis testing, p values, confidence intervals, and the difference between statistical and clinical significance, it may affect healthcare providers' ability to make clinical decisions without relying purely on the research investigators deemed level of significance. Therefore, an overview of these concepts is provided to allow medical professionals to use their expertise to determine if results are reported sufficiently and if the study outcomes are clinically appropriate to be applied in healthcare practice.

Hypothesis Testing

Investigators conducting studies need research questions and hypotheses to guide analyses. Starting with broad research questions (RQs), investigators then identify a gap in current clinical practice or research. Any research problem or statement is grounded in a better understanding of relationships between two or more variables. For this article, we will use the following research question example:

Research Question: Is Drug 23 an effective treatment for Disease A?

Research questions do not directly imply specific guesses or predictions; we must formulate research hypotheses. A hypothesis is a predetermined declaration regarding the research question in which the investigator(s) makes a precise, educated guess about a study outcome. This is sometimes called the alternative hypothesis and ultimately allows the researcher to take a stance based on experience or insight from medical literature. An example of a hypothesis is below.

Research Hypothesis: Drug 23 will significantly reduce symptoms associated with Disease A comparedto Drug 22.

The null hypothesis states that there is no statistical difference between groups based on the stated research hypothesis.

Researchers should be aware of journal recommendations when considering how to report p values, and manuscripts should remain internally consistent.

Regarding p values, as the number of individuals enrolled in a study (the sample size) increases, the likelihood of finding a statistically significant effect increases. With very large sample sizes, the p-value can be very lowsignificant differences in the reduction of symptoms for Disease A between Drug 23 and Drug 22. The null hypothesis is deemed true until a study presents significant data to support rejecting the null hypothesis. Based on the results, the investigators will either reject the null hypothesis (if they found significant differences or associations) or fail to rejectthe null hypothesis (they could not provide proof that there were significant differences or associations).

To test a hypothesis, researchers obtain data on a representative sample to determine whether to reject or fail to reject a null hypothesis. In most research studies, it is not feasible to obtain data for an entire population. Using a sampling procedure allows for statistical inference, though this involves a certain possibility of error.[1]When determining whether to reject or fail to reject the null hypothesis, mistakes can be made: Type I and Type II errors. Though it is impossible to ensure that these errors have not occurred, researchers should limit the possibilities of these faults.[2]

Significance

Significance is a term to describe the substantive importance of medical research. Statistical significance is the likelihood of results due to chance.[3]Healthcare providers should always delineate statistical significance from clinical significance, a common error when reviewing biomedical research.[4]When conceptualizing findings reported as either significant or not significant, healthcare providers should not simply accept researchers' results or conclusions without considering the clinical significance. Healthcare professionals should consider the clinical importance of findings and understand both p values and confidence intervals so they do not have to rely on the researchers to determine the level of significance.[5]One criterion often used to determine statistical significance is the utilization of p values.

P Values

P values are used in research to determine whether the sample estimate is significantly different from a hypothesized value. The p-value is the probability that the observed effect within the study would have occurred by chance if, in reality, there was no true effect. Conventionally, data yielding ap<0.05 or p<0.01 is considered statistically significant. While some have debated that the 0.05 level should be lowered, it is still universally practiced.[6]Hypothesis testing allows us to determine the size of the effect.

An example of findings reported with p values are below:

Statement: Drug 23 reduced patients' symptoms compared to Drug 22. Patients who received Drug 23 (n=100) were 2.1 times less likely than patients who received Drug 22 (n = 100) to experience symptoms of Disease A, p<0.05.

Or

Statement:Individuals who were prescribed Drug 23 experienced fewer symptoms (M = 1.3, SD = 0.7) compared to individuals who were prescribed Drug 22 (M = 5.3, SD = 1.9). This finding was statistically significant, p= 0.02.

For either statement, if the threshold had been set at 0.05, the null hypothesis (that there was no relationship) should be rejected, and we should conclude significant differences. Noticeably, as can be seen in the two statements above, some researchers will report findings with < or > and others will provide an exact p-value(0.000001) but never zero [6]. When examining research, readers should understand how p values are reported. The best practice is to report all p values for all variables within a study design, rather than only providing p values for variables with significant findings.[7]The inclusion of all p values provides evidence for study validity and limits suspicion for selective reporting/data mining.

While researchers have historically used p values, experts who find p values problematic encourage the use of confidence intervals.[8]. P-values alone do not allow us to understand the size or the extent of the differences or associations.[3]In March 2016, the American Statistical Association (ASA) released a statement on p values, noting that scientific decision-making and conclusions should not be based on a fixed p-value threshold (e.g., 0.05). They recommend focusing on the significance of results in the context of study design, quality of measurements, and validity of data. Ultimately, the ASA statement noted that in isolation, a p-value does not provide strong evidence.[9]

When conceptualizing clinical work, healthcare professionals should consider p values with a concurrent appraisal study design validity. For example, a p-value from a double-blinded randomized clinical trial (designed to minimize bias) should be weighted higher than one from a retrospective observational study [7]. The p-value debate has smoldered since the 1950s[10],and replacement with confidence intervals has been suggested since the 1980s.[11]

Confidence Intervals

A confidence interval provides a range of values within given confidence (e.g., 95%), including the accurate value of the statistical constraint within a targeted population.[12]Most research uses a 95% CI, but investigators can set any level (e.g., 90% CI, 99% CI).[13]A CI provides a range with the lower bound and upper bound limits of a difference or association that would be plausible for a population.[14]Therefore, a CI of 95% indicates that if a study were to be carried out 100 times, the range would contain the true value in 95,[15]confidence intervals provide more evidence regarding the precision of an estimate compared to p-values.[6]

In consideration of the similar research example provided above, one could make the following statement with 95% CI:

Statement: Individuals who were prescribed Drug 23 had no symptoms after three days, which was significantly faster than those prescribed Drug 22; there was a mean difference between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).

It is important to note that the width of the CI is affected by the standard error and the sample size; reducing a study sample number will result in less precision of the CI (increase the width).[14]A larger width indicates a smaller sample size or a larger variability.[16]A researcher would want to increase the precision of the CI. For example, a 95% CI of 1.43 – 1.47 is much more precise than the one provided in the example above. In research and clinical practice, CIs provide valuable information on whether the interval includes or excludes any clinically significant values.[14]

Null values are sometimes used for differences with CI (zero for differential comparisons and 1 for ratios). However, CIs provide more information than that.[15]Consider this example: A hospital implements a new protocol that reduced wait time for patients in the emergency department by an average of 25 minutes (95% CI: -2.5 – 41 minutes). Because the range crosses zero, implementing this protocol in different populations could result in longer wait times; however, the range is much higher on the positive side. Thus, while the p-value used to detect statistical significance for this may result in "not significant" findings, individuals should examine this range, consider the study design, and weigh whether or not it is still worth piloting in their workplace.

Similarly to p-values, 95% CIs cannot control for researchers' errors (e.g., study bias or improper data analysis).[14]In consideration of whether to report p-values or CIs, researchers should examine journal preferences. When in doubt, reporting both may be beneficial.[13]An example is below:

Reporting both: Individuals who were prescribed Drug 23 had no symptoms after three days, which was significantly faster than those prescribed Drug 22, p = 0.009. There was a mean difference between the two groups of days to the recovery of 4.2 days (95% CI: 1.9 – 7.8).

Clinical Significance

Recall that clinical significance and statistical significance are two different concepts. Healthcare providers should remember that a study with statistically significant differences and large sample size may be of no interest to clinicians, whereas a study with smaller sample size and statistically non-significant results could impact clinical practice.[14]Additionally, as previously mentioned, a non-significant finding may reflect the study design itself rather than relationships between variables.

Healthcare providers using evidence-based medicine to inform practice should use clinical judgment to determine the practical importance of studies through careful evaluation of the design, sample size, power, likelihood of type I and type II errors, data analysis, and reporting of statistical findings (p values, 95% CI or both).[4]Interestingly, some experts have called for "statistically significant" or "not significant" to be excluded from work as statistical significance never has and will never be equivalent to clinical significance.[17]

The decision on what is clinically significant can be challenging, depending on the providers' experience and especially the severity of the disease. Providers should use their knowledge and experiences to determine the meaningfulness of study results and make inferences based not only on significant or insignificant results by researchers butthrough their understanding of study limitations and practical implications.

Nursing, Allied Health, and Interprofessional Team Interventions

All physicians, nurses, pharmacists, and other healthcare professionals should strive to understand the concepts in this chapter. These individuals should maintain the ability to review and incorporate new literature for evidence-based and safe care.

Hypothesis Testing, P Values, Confidence Intervals, and Significance (2024)

FAQs

What is the p-value confidence interval and level of significance? ›

p-values simply provide a cut-off beyond which we assert that the findings are 'statistically significant' (by convention, this is p<0.05). A confidence interval that embraces the value of no difference between treatments indicates that the treatment under investigation is not significantly different from the control.

What is the significance level of the p-value in hypothesis testing? ›

Conventionally, data yielding a p<0.05 or p<0.01 is considered statistically significant. While some have debated that the 0.05 level should be lowered, it is still universally practiced. [6] Hypothesis testing allows us to determine the size of the effect.

Is p-value 0.05 the same as 95 confidence interval? ›

In accordance with the conventional acceptance of statistical significance at a P-value of 0.05 or 5%, CI are frequently calculated at a confidence level of 95%. In general, if an observed result is statistically significant at a P-value of 0.05, then the null hypothesis should not fall within the 95% CI.

How do you interpret p-values and significance? ›

A p-value measures the probability of obtaining the observed results, assuming that the null hypothesis is true. The lower the p-value, the greater the statistical significance of the observed difference. A p-value of 0.05 or lower is generally considered statistically significant.

How to interpret confidence intervals in hypothesis testing? ›

We first choose the confidence level of the interval; usually we choose the level to be 95%. This would mean that if we were to repeat our experiment 100 times and compute 100 corresponding confidence intervals, approximately 95 of the confidence intervals would contain the population mean.

How to interpret 95% confidence intervals? ›

Strictly speaking a 95% confidence interval means that if we were to take 100 different samples and compute a 95% confidence interval for each sample, then approximately 95 of the 100 confidence intervals will contain the true mean value (μ).

When to use 0.01 and 0.05 level of significance? ›

How to Find the Level of Significance? If p > 0.05 and p ≤ 0.1, it means that there will be a low assumption for the null hypothesis. If p > 0.01 and p ≤ 0.05, then there must be a strong assumption about the null hypothesis. If p ≤ 0.01, then a very strong assumption about the null hypothesis is indicated.

What does p 0.01 mean? ›

A P-value of 0.01 infers, assuming the postulated null hypothesis is correct, any difference seen (or an even bigger “more extreme” difference) in the observed results would occur 1 in 100 (or 1%) of the times a study was repeated.

What is the significant threshold p-value? ›

A significance threshold is used to convert a p-value into a yes/no or a true/false result. This introduces the possibility of an error: that we conclude something is true based on our test when it is actually not true. A type I error occurs when we calculate a “significant” p-value when we shouldn't have.

How do you know if a 95 confidence interval is significant? ›

So, if your significance level is 0.05, the corresponding confidence level is 95%. If the P value is less than your significance (alpha) level, the hypothesis test is statistically significant. If the confidence interval does not contain the null hypothesis value, the results are statistically significant.

What is the 95% confidence interval for p? ›

For a confidence interval with level C, the value p is equal to (1-C)/2. A 95% confidence interval for the standard normal distribution, then, is the interval (-1.96, 1.96), since 95% of the area under the curve falls within this interval.

How do you compare 95% confidence intervals? ›

If the 95% confidence intervals are known for two sample means, there is a simple test to determine whether those sample means are significantly different. If the 95% CIs for the two sample means do not overlap, the means are significantly different at the P < 0.05 level.

What is the p-value and significance level for dummies? ›

Over the years, the value of 0.05 has become accepted as a reasonable criterion for declaring significance. If you adopt the criterion that p must be less than or equal to 0.05 to declare significance, then you'll keep the chance of making a Type I error to no more than 5 percent.

What is the difference between p-value and significance level? ›

The p-value represents the strength of evidence against the null hypothesis, while the significance level represents the level of evidence required to reject the null hypothesis. If the p-value is less than the significance level, the null hypothesis is rejected, and the alternative hypothesis is accepted.

What does p-value and significance level say about conclusion? ›

If the p-value is lower, reject the null hypothesis, and make the conclusion that supports the potential change. If the p-value is higher, fail to reject the null hypothesis, and make the conclusion that supports the status quo. Comparing the p-value and the level of significance, we have: 0.0574 > 0.05.

What is the level of significance for a 95% confidence interval? ›

Level of significance is a statistical term for how willing you are to be wrong. With a 95 percent confidence interval, you have a 5 percent chance of being wrong.

When a p-value is 0.05 it is considered significant? ›

If the p-value is less than 0.05, it is judged as “significant,” and if the p-value is greater than 0.05, it is judged as “not significant.” However, since the significance probability is a value set by the researcher according to the circ*mstances of each study, it does not necessarily have to be 0.05.

How to find the p-value? ›

The p-value is calculated using the sampling distribution of the test statistic under the null hypothesis, the sample data, and the type of test being done (lower-tailed test, upper-tailed test, or two-sided test). The p-value for: a lower-tailed test is specified by: p-value = P(TS ts | H 0 is true) = cdf(ts)

What is the significance level of the p-value in regression? ›

A common threshold of the P-value is 0.05. Note: A P-value of 0.05 means that 5% of the times, we will falsely reject the null hypothesis. It means that we accept that 5% of the times, we might falsely have concluded a relationship.

Top Articles
Amaranth Tabouli Recipe
4-Ingredient Powdered Sugar Frosting ~ Simple Sweet Recipes
NOAA: National Oceanic &amp; Atmospheric Administration hiring NOAA Commissioned Officer: Inter-Service Transfer in Spokane Valley, WA | LinkedIn
Koopa Wrapper 1 Point 0
Regal Amc Near Me
Pga Scores Cbs
Archived Obituaries
Blanchard St Denis Funeral Home Obituaries
50 Meowbahh Fun Facts: Net Worth, Age, Birthday, Face Reveal, YouTube Earnings, Girlfriend, Doxxed, Discord, Fanart, TikTok, Instagram, Etc
What Auto Parts Stores Are Open
Wmlink/Sspr
The Many Faces of the Craigslist Killer
Www.paystubportal.com/7-11 Login
Johnston v. State, 2023 MT 20
OpenXR support for IL-2 and DCS for Windows Mixed Reality VR headsets
Jack Daniels Pop Tarts
Marion County Wv Tax Maps
Wilmot Science Training Program for Deaf High School Students Expands Across the U.S.
Louisiana Sportsman Classifieds Guns
Webcentral Cuny
Elemental Showtimes Near Cinemark Flint West 14
Foxy Brown 2025
Promiseb Discontinued
Wemod Vampire Survivors
2013 Ford Fusion Serpentine Belt Diagram
Craigslist Maryland Trucks - By Owner
Ecampus Scps Login
Bellin Patient Portal
How Taraswrld Leaks Exposed the Dark Side of TikTok Fame
Everything To Know About N Scale Model Trains - My Hobby Models
Apparent assassination attempt | Suspect never had Trump in sight, did not get off shot: Officials
Plost Dental
Creed 3 Showtimes Near Island 16 Cinema De Lux
Enduring Word John 15
Abga Gestation Calculator
Melissa N. Comics
Nacogdoches, Texas: Step Back in Time in Texas' Oldest Town
Leland Nc Craigslist
Clark County Ky Busted Newspaper
Weather Underground Bonita Springs
South Bend Tribune Online
Electronic Music Duo Daft Punk Announces Split After Nearly 3 Decades
Express Employment Sign In
Japanese Big Natural Boobs
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Kent And Pelczar Obituaries
Gamestop Store Manager Pay
Bf273-11K-Cl
The 5 Types of Intimacy Every Healthy Relationship Needs | All Points North
Thrift Stores In Burlingame Ca
Craigslist Centre Alabama
Texas 4A Baseball
Latest Posts
Article information

Author: Rueben Jacobs

Last Updated:

Views: 5834

Rating: 4.7 / 5 (77 voted)

Reviews: 84% of readers found this page helpful

Author information

Name: Rueben Jacobs

Birthday: 1999-03-14

Address: 951 Caterina Walk, Schambergerside, CA 67667-0896

Phone: +6881806848632

Job: Internal Education Planner

Hobby: Candle making, Cabaret, Poi, Gambling, Rock climbing, Wood carving, Computer programming

Introduction: My name is Rueben Jacobs, I am a cooperative, beautiful, kind, comfortable, glamorous, open, magnificent person who loves writing and wants to share my knowledge and understanding with you.