What is test-retest reliability and why is it important? - Cambridge Cognition (2024)

Insights

15 September 2016

Back to resource centre

Digital Health & Innovation Scientist, Matthew Hobbs explores what test re-test reliability is, how you would measure it and why it is important when choosing cognitive tests.

What is test re-test reliability?

When you come to choose the measurement tools for your experiment, it is important to check that they are valid (i.e. appropriately measure the construct or domain in question), and that they could also reliably replicate the result more than once in the same situation and population.

In an experiment with multiple time points, you would hope that the measurement tool chosen could consistently reproduce the same result over all the visits providing all other variables remain the same. Tools which do provide such consistency are regarded as having high test re-test reliability, and therefore appropriate for use in longitudinal research.

Read our papers

Why is it important to choose measures with good reliability?

Having good test re-test reliability signifies the internal validity of a test and ensures that the measurements obtained in one sitting are both representative and stable over time. Often, test re-test reliability analyses are conducted over two time-points (T1, T2) over a relatively short period of time, to mitigate against conclusions being due to age-related changes in performance, as opposed to poor test stability.

Without good reliability, it is difficult for you to trust that the data provided by the measure is an accurate representation of the participant’s performance rather than due to irrelevant artefacts in the testing session such as environmental, psychological or methodological processes.

Often your aim in research will be to evaluate the impact of an intervention on an individual’s performance. Without the confidence that the measure you’ve chosen is reliable, it is difficult to ascertain whether differences in performance pre and post-intervention are genuinely due to the intervention provided and not an artefact of the tool.

A tool with low reliability can therefore mask the true effects of an intervention, which could have serious ramifications on the conclusions drawn, and therefore the future progression of that intervention.

How is test re-test reliability calculated?

Traditionally, the approach to assessing the reliability of scores has been to ascertain the magnitude of relationship between the test statistics. Thus, if a measurement tool consistently produces the same result, the relationship between those data points would be high.

To answer the question of relationship, researchers have often turned to calculating the correlation coefficient (r) which measures the strength of relationship. A measurement tool providing the same data output at every time point would therefore produce a perfect linear correlation of r = 1.

However, whilst it is useful to know the degree of relationship between the data points, the true question we are aiming to ascertain with test re-test reliability is the magnitude of agreement between the time points rather than the relationship.

When we use the same measure in the same population over T1 and T2, it is very possible to obtain a high degree of relationship as measured through the correlation coefficient, yet show a poor level of agreement (Bland & Altman, 1986).

The question of ascertaining agreement between data points rather than the relationship can be answered through Bland and Altman’s (1986) statistical procedure which can summarise the lack of agreement through calculating the bias.

Through plotting the data points and calculating the difference between each data point and the mean (mean difference) alongside the standard deviation, we can assess how agreeable the measures are. We would expect 95% of differences to be less than two standard deviations away from the mean, allowing us to determine how agreeable the measures are based on how close the data points deviate from the line of equality.

Learn more about our science

CANTAB test re-rest reliability

Many papers exist in the literature calculating the test-retest reliability of our CANTAB tests, with the overall conclusion demonstrating relatively good reliability (Lowe & Rabbitt, 1998).

However, the conclusions drawn from literature based analyses rely heavily on the outcome measures chosen by the researchers to investigate and are often those related to the research question of those individuals.

It is therefore not appropriate to summarise a test re-test reliability conclusion for an overall CANTAB task based on the analysis of one outcome measure, but instead to assess multiple outcome measures, especially those applicable to the majority of research projects.

We are currently underway in conducting an updated analysis of the test re-test reliability credentials of our tasks using these more appropriate statistical measures, across outcome measures more commonly recommended for experiments.

Some of our preliminary analyses are trickling in, and we are excited to share these results in the near future.

As a teaser, have a look at our latest analysis of the Paired Associates Learning task (PAL). The data below is generated using the PAL Total Errors Adjusted (PALTEA) outcome measure comparing two separate visits by the same participants (N = 45).

What is test-retest reliability and why is it important? - Cambridge Cognition (1)

The Bland-Altman plot above, is a special variation of a scatter plot. The x-axis represents the baseline value, which is equal to the mean of T1 and T2 scores per each participant; the y-axis displays the difference between the two scores. The solid horizontal line represents the overall Mean­­, while the dashed line stands for the “zero difference”: in the purely ideal case of perfect agreement between two methods, or, as in our case, two identical T1 and T2 scores, all the points would lie on the dashed line. The upper dashed line indicates 2 standard deviations from the mean, while the lower dashed line represents -2 standard deviations. Based on clinical, and experimental considerations, and goals, the scientist(s) have to define a priori acceptable limits for the Bland-Altman plot. Finally, the bar plots on the graph axes show us the score frequency distributions: e.g., the higher the bar, the greater is the number of points with a given value in our dataset.

The data above is a brief snapshot of the analysis currently underway to re-confirm the test-retest reliability of our CANTAB tasks on our newest platform CANTAB Connect. We are building upon the published literature which already shows good test-retest reliability for CANTAB and provide a more comprehensive body of work for all of our tasks across a wider variety of outcome measures. We will keep you updated with the progress of this project, and look forward to presenting more reliability data when we have finished crunching the numbers soon.

References

Bland, M, J., & Altman, D. (1986). Statistical methods for assessing agreement between two methods of clinical measurement. The Lancet, 327(8476), 307–310. doi:10.1016/s0140-6736(86)90837-8

Giavarina, D. (2015). Understanding Bland Altman analysis. Biochemia Medica, 25(2), 141–151. http://doi.org/10.11613/BM.2015.015

Lowe, C., & Rabbitt, P. (1998). Test\re-test reliability of the CANTAB and ISPOCD neuropsychological batteries: Theoretical and practical issues. Neuropsychologia, 36(9), 915–923. doi:10.1016/s0028-3932(98)00036-0

Related publications

You may also be interested in:

29 April 2024

Supporting research into central nervous system (CNS) disorders to improve patient quality of life

16 April 2024

2023 CANTAB Research Grant: Exploring the role of intermittent exogenous ketosis on physical and mental fatigue during ultra-endurance performance

03 April 2024

2023 CANTAB Research Grant: The microbiota-gut-brain axis in young binge drinkers: The interplay between alcohol use, microbiota composition and neurocognitive functioning

22 February 2024

07 December 2023

Technology for successful decentralised clinical trials

15 September 2023

Why don’t we trust people’s lived-experience of fatigue?

Author:

What is test-retest reliability and why is it important? - Cambridge Cognition (2)

Matthew Hobbs

Digital Health & Innovation Scientist

What is test-retest reliability and why is it important? - Cambridge Cognition (2024)
Top Articles
Reasons why packs aren't water repellent? - Backpacking Light
Which browsers will give me the best experience in Online Banking?
Hotels Near 6491 Peachtree Industrial Blvd
Fort Morgan Hometown Takeover Map
Melson Funeral Services Obituaries
The Potter Enterprise from Coudersport, Pennsylvania
1movierulzhd.fun Reviews | scam, legit or safe check | Scamadviser
Retro Ride Teardrop
Toyota gebraucht kaufen in tacoma_ - AutoScout24
Stream UFC Videos on Watch ESPN - ESPN
Toonily The Carry
Pwc Transparency Report
Facebook Marketplace Charlottesville
Pro Groom Prices – The Pet Centre
California Department of Public Health
How Much Is Tay Ks Bail
Traveling Merchants Tack Diablo 4
College Basketball Picks: NCAAB Picks Against The Spread | Pickswise
Elbert County Swap Shop
Everything To Know About N Scale Model Trains - My Hobby Models
Scripchat Gratis
Shelby Star Jail Log
Cowboy Pozisyon
Jesus Calling Feb 13
My Reading Manga Gay
Miles City Montana Craigslist
Helpers Needed At Once Bug Fables
The Posturepedic Difference | Sealy New Zealand
Uky Linkblue Login
Beaver Saddle Ark
Whas Golf Card
Powerball lottery winning numbers for Saturday, September 7. $112 million jackpot
How to Get Into UCLA: Admissions Stats + Tips
How to Destroy Rule 34
Retire Early Wsbtv.com Free Book
2024 Ford Bronco Sport for sale - McDonough, GA - craigslist
State Legislatures Icivics Answer Key
2700 Yen To Usd
Daly City Building Division
Google Flights Orlando
Gold Dipping Vat Terraria
Santa Clara County prepares for possible ‘tripledemic,’ with mask mandates for health care settings next month
Wilson Tire And Auto Service Gambrills Photos
Television Archive News Search Service
Best Haircut Shop Near Me
Pas Bcbs Prefix
Rubmaps H
Sml Wikia
De Donde Es El Area +63
Kindlerso
Latest Posts
Article information

Author: Golda Nolan II

Last Updated:

Views: 5765

Rating: 4.8 / 5 (58 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Golda Nolan II

Birthday: 1998-05-14

Address: Suite 369 9754 Roberts Pines, West Benitaburgh, NM 69180-7958

Phone: +522993866487

Job: Sales Executive

Hobby: Worldbuilding, Shopping, Quilting, Cooking, Homebrewing, Leather crafting, Pet

Introduction: My name is Golda Nolan II, I am a thoughtful, clever, cute, jolly, brave, powerful, splendid person who loves writing and wants to share my knowledge and understanding with you.