What is gold standard and what is ground truth? (2024)

"What has not been examined impartially, has not been well examined. Scepticism istherefore the first step towards truth." (Denis Diderot, Philosopher)

Clinical decision-making is complex and based upon accurate evaluation of clinical findingsusing diagnostic tests and reference standard data. Given that many aspects of dentalexamination are not direct measures, but rely on indirect measures, it is important forclinicians to understand the basic principles and terms used to assess the accuracy ofdiagnostic tests and to appropriately evaluate published literature regarding these tests.Luckily, there is a variety of readily available metric systems to assess the quality ofdiagnostic test studies and to help clinicians better understand evidence-based literature.

Dentistry, or shall we say Clinical Dentistry, is becoming more complex and patients havebeen better informed. Importantly, health care has also shifted focus to emphasizeevidence-based practice (EBP). EBP is considered the gold standard for health professionaldecision-making. No one can deny that the activities in the field of evidence-basedDentistry have grown exponentially in the last decade. However, we cannot forget thatPierre Fauchard (1678 - 1761) may have been the first to warn the dental field about theconcept of evidence, taking into consideration the practices of the time. Fauchard andJames Lind (1716-1790) were both concerned about the health of sailors dying of scurvy and,for this reason, conceptualized a "clinical trial" involving the use of vitamin C tocounteract the disease. The former even tested techniques for the removal of caries, dentalrestoration and implants.

The true meaning of evidence-based Dentistry is grounded in a solid understanding andapplication of clinical epidemiology principles to reduce any confusion that may exist dueto academic training. Epidemiology is defined as the "Science of making predictions aboutindividual patients or a group, by recounting clinical events in similar patients in orderto ensure that the predictions are correct". Clinical epidemiology is "a subfield thatapplies the principles and methods of epidemiology to study the occurrence and outcomes ofdisease in people with a given illness".1

The ability to precisely define a question of interest (clinical question), derive relevantinformation from databases, differentiate research methodology, select statisticalprocedures as well as the ability to critically evaluate studies and understand theirimplications for care, are required skills.2However, let us not be too optimistic; there are drawbacks. Ironically, political, socialand economic pressure limits the time available for practitioners to seek answers toclinical questions. Furthermore, there is a surprising number of weekly published studies,from the best to the worst.

This paper will discuss a clinical question, among several that can be built"epidemiologically", specifically, diagnostic test accuracy. In other words, the study willprovide estimates of the ability of a diagnostic test to discriminate between patients withor without a pre-defined health condition, comparing the results with a standard referencetest. There will always be one predictor variable (result of the test) and an outcome(presence or absence of the disease).3 Furthermore,we add the concept of ground truth, which is a set of measures known to be more accuratethan the measurements of the system you are testing.

The term gold standard refers to a benchmark that is the available under reasonableconditions. Indeed, is not the perfect test, but merely the best available one that has astandard with known results. This is especially important when faced with the impossibilityof direct measurements.4 In Dentistry, for example,micro computed tomography can be considered a gold standard for the diagnosis of proximalcarious lesions of posterior teeth, as microscopic examination of the enamel hasdemonstrated its acuracy.5 In the past, referring toan examination as the gold standard meant that it was unqualifiedly the most accurateprocedure. However, in present clinical practice, even though the intent of term has notchanged, its use is dependent upon the context of the statistical method being used.

A gold standard study may refer to an experimental model that has been thoroughly testedand has a reputation in the field as a reliable method. The correct interpretation of adiagnostic test demands one to master specific concepts such as sensitivity, specificity,prevalence, positive and negative predictive values. The sensitivity of a test is definedas the proportion of people with the inherent disease who test positive (true-positive).The specificity of a test is the proportion of people without the disease that have anegative test (true-negative). In some literature, one can find the term 1-specificity thatis defined as the rate of false positives (in other words, the percentage of the sampleincorrectly identified as positive). Typically, a Receiver Operating Characteristic curve(ROC) is used as a graphical representation of the rate of sensitivity and specificity. Thearea under the curve represents the accuracy of the test. The closer the value is to one,the greater the test accuracy. In many clinical scenarios, there is a trade off betweensensitivity and specificity. This trade off is related to the fact that some people willclearly be normal while others will have the condition. However, there will inevitably be agroup of patients who fall in a middle zone (neither clearly normal nor abnormal). In suchinstances, an arbitrary cut off will be used to distinguish between normal and abnormal.Any screening test used to distinguish between patients in this circ*mstance will have atrade off between sensitivity and specificity. One way to address this dilemma is to use acombination of diagnostic tests to develop a diagnosis.

Positive predictive value is the probability of patients with true positive results (theyhave the condition of interest) to test positive. Negative predictive value, on the otherhand, is defined as the probability of patients with true negative results (no disease) totest negative. It is important to recognize that diagnostic tests are influenced by theprevalence of the disease in the population being tested. Prevalence is the probability ofan individual to have the disease (based on clinical characteristics and demographic data)in a population and includes both newly diagnosed cases and existing cases. Likelihoodratio is the ratio between the probability of a particular outcome of a diagnostic test inindividuals with the disease and the probability of that same outcome in individualswithout the disease. This may be positive or negative.6

To best understand how and why diagnostic tests function, a basic understanding of Bayestheorem is needed. Bayes defined probability as "the ratio between the value at which anexpectation depending on the happening of the event ought to be computed, and the value ofthe thing expected upon its happening".7 Forexample, the probability a person has to be diagnosed with oral cancer and having apositive test for the condition depends not only on the relationship between events, butalso on the accuracy of the test and the prevalence of the condition in the populationsample. Thus, if one wishes to evaluate the operating characteristics of a diagnostic testand selects a sample consisting of only a few people with oral cancer, whereas anotherindividual evaluates the same diagnostic test in a sample with a greater proportion ofpeople with oral cancer, test sensitivity, specificity, positive and negative predictivevalues may vary considerably even though the test procedure was identical.8

An ideal diagnostic method hypothetically presents a sensitivity of 100% with respect todetection of injury or illness (identifying all cases of injury or disease in all specimensevaluated or individuals with no false negatives) and a specificity of 100% (without falsepositives, pointing to injury or illness where there is none). Thus, in practice, there isno perfect gold standard. Instead, we have a method with the greatest sensitivity and thehighest specificity. Therefore, the gold standard diagnostic of the past has probably beenchanged today.

Higher sensitivity values increase negative predictive values. Higher specificity valuesincrease positive predictive values. Thus, if the test has higher values of sensitivity andspecificity, all people having a positive test result have the disease, while all patientswho have a negative test do not have the disease. Therefore, there is a trade off betweenthese values. This concept is important in instances in which the diseases have a poorprognosis. In these cases, one might want the test to have higher sensitivity so as not tounduly distress patients with lots of false positive results. Alternatively, if a diseaseis easily treatable, it might be more important to screen the population at risk by meansof a test with less sensitivity and higher specificity. For patients who are a falsepositive, a second test can be used to confirm diagnosis.9

For example, in Medicine, angiography (arteriography) by contrast was a former goldstandard for heart disease. A recent study reported the sensitivity of angiography to be66.5% and the specificity to be 82.6%. Now magnetic resonance angiography (MRA) has becomethe new gold standard, with a reported sensitivity of 86.5% and a specificity of83.4%.10 The acceptance of a new gold standarddefault method takes time and exhaustive evidence, especially if the internal validity isconsistent and acceptable.

As for ground truth, it can signify the mean value from the collection of data from aparticular experimental model (that preferentially uses gold standard method) representingbehavioral reference. For example, using an universal shear testing machine to evaluate thestrength of a new resin for bracket bonding, we obtain a value of X. This value can becompared to a reference value obtained by previous observations. Thus, if the resulting Xvalue is similar to or higher than those found in ground truth, it can be said that thisnew resin has an appropriate value. There is a consensus that the clinical resistancepattern for bracket bonding corresponds to something around 6.8 Mpa (this value matchesmore in ground truth definition than gold standard as it can not be preciselychecked).11 So this value can be used asreference ground truth to accept or reject the hypothesis that a particular new resin hasadmissible clinical strength or resistance. Therefore, in simple terms, a gold standardtest refers to a diagnostic method with the best accuracy; whereas ground truth representsthe reference values used as standard for comparison purposes.

In a recent study, authors classified midpalatal suture ossification in five maturationstages.12 A total of 140 cone-beam computedtomography (CBCT) scans from palatal suture were collected and blindly classified into fivestages. The images were used as ground truth reference. Subsequently, 30 images wererandomly evaluated and reclassified by three experienced orthodontists. The authors foundstrong agreement in the proposed classification method, with kappa index ranging from 0.82to 0.93. However, for this diagnostic method of suture maturation to become a goldstandard, histological confirmation is required to test specificity and sensibility. Inother words, it should be tested whether CBCT scans of "no suture" really mean midpalatalsuture tissue absence or the opposite in their five stages.

When a clinician or researcher is interested in critiquing a study, which describes theprocess for evaluating a diagnostic test, or conducting such study, it is important to notethat studies of a diagnostic test follow the rules described in the literature. TheStandards for Reporting of Diagnostic Accuracy Studies (STARD)13 is a list containing 25 items used to criticallyevaluate the quality of a particular diagnostic test study. Another accepted format used toevaluate studies of diagnostic tests is the Quality Assessment of Studies ofDiagnostic Accuracy Included in Systematic Reviews (QUADAS).14 the latter is a 14-item checklist (answers can be"yes", " no" or" unclear ") used to measure potential risk of bias in systematic reviews.Systematic reviews of these studies may follow the format proposed by the CochraneCollaboration available at (Cochrane Handbook for Systematic Reviews of DiagnosticTest Accuracy) (http://srdta.cochrane.org/handbook-dta-reviews).

REFERENCES

1. Portney LG, Watkins MP. Foundations of clinical research: applications topractice. 3. New Jersey: Prentice Hall Health; 2009. [Google Scholar]

2. Cardoso JR. Fontes SV, f*ckujima MM, Cardeal JM. Fisioterapia neurofuncional.Fundamentos para a prática. São Paulo: Atheneu; 2007. Fisioterapia baseada em evidências; pp. 29–38. [Google Scholar]

3. Korevaar DA, van Enst WA, Spijker R, Bossuyt PM, Hooft L. Reporting quality of diagnostic accuracy studies: asystematic review and meta-analysis of investigations on adherence toSTARD. Evid Based Med. 2014;19(2):47–54. [PubMed] [Google Scholar]

4. Versi E. "Gold standard" is an appropriate term? BMJ. 1992;305(6846):187–187. [PMC free article] [PubMed] [Google Scholar]

5. Soviero VM, Leal SC, Silva RC, Azevedo RB. Validity of MicroCT for in vitro detection of proximalcarious lesions in primary molars. J Dent. 2012;40(1):35–40. [PubMed] [Google Scholar]

6. Haynes RB, Sackett DL, Guyatt GH, Tugwell P. Clinical epidemiology: how to do clinical practiceresearch. 3. Philadelphia: Lippincott Williams & Wilkins; 2006. [Google Scholar]

7. An essay towards solving a problem in the doctrine ofchances by the late Rev Mr. Bayes, communicated by Mr. Price, in a letter to JohnCanton MA and FRS. Read December 23, 1763. First publication. Philos Trans R Soc Lond. 1764;53:370–418. http://www.stat.ucla.edu/history/essay.pdf [Google Scholar]

8. Mazur DJ. A history of evidence in medical decisions: from thediagnostic sign to Bayesian inference. Med Decis Making. 2012;32(2):227–231. [PubMed] [Google Scholar]

9. Saah AJ, Hoover DR. "Sensitivity" and "specificity" reconsidered: themeaning of these terms in analytical and diagnostic settings. Ann Intern Med. 1997;126(1):91–94. [PubMed] [Google Scholar]

10. Greenwood JP, Maredia N, Younger JF, Brown JM, Nixon J, Everett CC, et al. Cardiovascular magnetic resonance and single-photonemission computed tomography for diagnosis of coronary heart disease (CE-MARC): aprospective trial. Lancet. 2012;379(9814):453–460. [PMC free article] [PubMed] [Google Scholar]

11. Reynolds IR. A review of direct orthodontic bonding. Br J Orthod. 1975;2:171–178. [Google Scholar]

12. Angelieri F, Cevidanes LH, Franchi L, Gonçalves JR, Benavides E, McNamara JA., Jr Midpalatal suture maturation: classification method forindividual assessment before rapid maxillary expansion. Am J Orthod Dentofacial Orthop. 2013;144(5):759–769. [PMC free article] [PubMed] [Google Scholar]

13. Bossuyt PM, Reitsma JB, Bruns DE, Gatsonis CA, Glasziou PP, Irwig LM, et al. Towards complete and accurate reporting of studies ofdiagnostic accuracy: the STARD initiative. Standards for Reporting of DiagnosticAccuracy. Clin Chem. 2003;49:1–6. [PubMed] [Google Scholar]

14. Whiting P, Rutjes AW, Reitsma JB, Bossuyt PM, Kleijnen J. The development of QUADAS: a tool for the qualityassessment of studies of diagnostic accuracy included in systematicreviews. BMC Med Res Methodol. 2003;10:25–25. [PMC free article] [PubMed] [Google Scholar]

FAQs

What is gold standard and what is ground truth? ›

While the gold standard is a best effort to obtain the truth, ground truth is typically collected by direct observations. In machine learning and information retrieval, "ground truth" is the preferred term even when classifications may be imperfect; the gold standard is assumed to be the ground truth.

Read On ›

What is the difference between ground truth and gold standard? ›

Therefore, in simple terms, a gold standard test refers to a diagnostic method with the best accuracy; whereas ground truth represents the reference values used as standard for comparison purposes.

Discover More Details ›

What is the gold standard approach? ›

In medicine and medical statistics, the gold standard, criterion standard, or reference standard is the diagnostic test or benchmark that is the best available under reasonable conditions. It is the test against which new tests are compared to gauge their validity, and it is used to evaluate the efficacy of treatments.

What is an example of a gold standard test? ›

For example, the gold standard test for Alzheimer's requires a biopsy on brain tissue - which can only be carried out post-mortem. Such a test would still, however, be used as the standard against which other tests are assessed. As new diagnostic methods become available, the "gold standard" test may change over time.

See Details ›

What is the gold standard in healthcare? ›

Gold standard

A method, procedure or measurement that is widely accepted as being the best available to test for or treat a disease.

Find Out More ›

What does the US use instead of the gold standard? ›

Key Takeaways. Fiat money is a government-issued currency that is not backed by a commodity such as gold. Fiat money gives central banks greater control over the economy because they can control how much money is printed. Most modern paper currencies, such as the U.S. dollar, are fiat currencies.

Tell Me More ›

What is the gold standard in layman's terms? ›

The gold standard is a monetary system where a country's currency or paper money has a value directly linked to gold. With the gold standard, countries agreed to convert paper money into a fixed amount of gold. A country that uses the gold standard sets a fixed price for gold and buys and sells gold at that price.

Show Me More ›

What is the U.S. dollar backed by? ›

Prior to 1971, the US dollar was backed by gold. Today, the dollar is backed by 2 things: the government's ability to generate revenues (via debt or taxes), and its authority to compel economic participants to transact in dollars.

Explore More ›

What would happen if the US went back to the gold standard? ›

Returning to a gold standard could harm national security by restricting the country's ability to finance national defense. A gold standard would prevent the sometimes necessary quick expansion of currency to finance war buildup.

Are any currencies backed by gold? ›

Currently, the gold standard isn't used as the monetary system for any nation. The last country to abandon it was Switzerland, which severed ties between its currency and gold in 1999. Not coincidentally, Switzerland has the seventh largest gold reserve of all countries.

Show Me More ›

What type of evidence is considered the gold standard? ›

Gold Standard of Evidence: The Randomized Controlled Trial (RCT) | Kennedy Krieger Institute.

Read The Full Story ›

What is an example of gold standard in US history? ›

Because adherents to the standard maintained a fixed price for gold, rates of exchange between currencies tied to gold were necessarily fixed. For example, the United States fixed the price of gold at $20.67 per ounce, and Britain fixed the price at £3 17s. 10½ per ounce.

See Details ›

Why was the gold standard abandoned? ›

Gold, along with silver, functioned as the international store of monetary value until the beginning of World War I. At that time, the gold standard was abandoned, primarily because nations needed deficit financing for the war, which increased the amount of paper money in circulation beyond nations' gold reserves.

Get More Info Here ›

What is the difference between gold standard and ground truth? ›

The term ground truth refers to the underlying absolute state of information; the gold standard strives to represent the ground truth as closely as possible. While the gold standard is a best effort to obtain the truth, ground truth is typically collected by direct observations.

Who benefits from the gold standard? ›

Gold is a major financial asset for countries and central banks. It is also used by the banks as a way to hedge against loans made to their government and as an indicator of economic health.

What is a synonym for gold standard? ›

A high standard by which others are measured. benchmark. standard. barometer. yardstick.

View Details ›

What is an example of a ground truth? ›

Used in statistics and machine learning, Ground Truth is data that we assume to be true. For example, you have two images. One image depicts a dog, and the other a cat. We know this to be true because we, as humans, have the ability to recognise different animals.

What is the ground truth? ›

Ground truth is information that is known to be real or true, provided by direct observation and measurement (i.e. empirical evidence) as opposed to information provided by inference.

Learn More ›

What is ground truth vs official truth? ›

The Official truth, with few or no data to collect on the ground. Summarizing, they are two complementary concepts: Ground truth: is the empirical evidence, the geographic reality. It is collected on location or proofs of existence like satellite images.

Discover More Details ›

What does gold standard mean in research? ›

In medicine and social sciences, the phrase “gold standard” is often used to characterize an object or procedure described as unequivocally the best in its genre, against which all others should be compared.

Show Me More ›