So often in veterinary medicine, as in other medical fields, we are looking for diagnostic tests to aid in treatment and to prognosticate for various diseases. Both infectious and noninfectious diseases may be diagnosed by detecting the causative agent, clinical signs, pathological changes, biochemical changes or surrogate evidence of past or present exposure to an agent (antibody).
Is a test useful?
There are several guidelines to determine if a diagnostic test is clinically useful when examining a report of a new diagnostic test:
•Has there been an independent masked comparison to a “gold standard” of diagnosis?
•Has the test been evaluated in patients with acute versus chronic disease, mild versus severe disease, treated versus untreated animals and animals with other similar, but different, disorders?
•Was the setting of the study adequately described?
•Has the reproducibility (repeatability) of the test been determined?
•Has the interpretation of the precision (observer variation) been determined?
•Has “normal” been defined sensibly for this test? Is the normal range representative of the population?
•If this test has been used in a sequence of tests, has its contribution to the overall validity of the diagnostic effort been determined?
•Based on the information provided, would you be able to replicate this test in a population?
•Has the utility of the test been determined?
The diagnostic test, if useful in the diagnostic process, should provide an accurate diagnosis, support application of specific treatments and hopefully should lead to a better clinical outcome.
Examples of reasons for false positive (FP) and false negative (FN) test results
False positive (FP):
•Group cross-reactions between antibodies to different organisms with similar epitopes on antigens (e.g., Mycobacterium sp.)
•Nonspecific inhibitors that mimic the effects of an antibody in its absence
•Agglutination of antigen by nonspecific agglutinins
False negative (FN):
•Improper timing due to stage of infection or appearance of an antibody
•Natural or induced tolerance to the antigens, (e.g., animals persistently infected with BVD virus)
•Improper selection of the test, such as a neutralizing antibody test when that type of antibody is not produced
•Non-specific inhibitors – anti-complementary serum, toxic substances.
•Incomplete or blocking antibodies (e.g., CFT testing for Brucella)
•Antibiotic-induced immunoglobulin suppression
When using serologic diagnostic tests, they may be qualitative or quantitative. When using a qualitative test, the result is either positive or negative. This makes for easy determination of the sensitivity and specificity of the test when compared to a “gold standard.” However, if the data obtained from test results are reported on a continuous scale, determination of cutoff points and alteration of that point may lead to tremendous changes in sensitivity, specificity, false negatives and false positives.
We look at a population of animals and classify them as either healthy or diseased. When examining an antibody response in populations, there will usually be some overlap as animals may appear healthy but are indeed diseased (and vice versa). This result may be due to laboratory error or perhaps a failure in the validation process. When using a continuous scale for the test result we must establish a cutoff point which will help in the most accurate detection of healthy and sick animals.
We will use three different cutoff points to illustrate the changes that occur in sensitivity and specificity, with resultant changes in false negative and positive rates, depending on the cutoff point.
The test result shown in Figure 1* ranges from lowest to highest when observing from left to right. If we set the cutoff point at point A, the resulting test is highly sensitive (100 percent). However, the specificity is lower, there are no false negatives, but there are a large number of false positive test results. This test would be useful as a screening test for a disease where the cost of a FN is high.
Suppose we set the cutoff at point B. The result would be a test with equal sensitivity and specificity as well as the same number of FPs and FNs. The usefulness of a particular test with these characteristics would be questionable.
Lastly, we will set the cutoff point at point C. This cutoff leads to an insensitive test where the specificity is extremely high (100 percent). This type of test would be useful in a situation where the cost of a FP is high.
So, as you can see, as you increase the specificity of a test, you lower the sensitivity when the test results are recorded on a continuous scale and vice versa. This information should confirm the importance of understanding how a test was developed, the conditions under which it was studied, the precision of the test and the validity of the test.
Using a test
The sensitivity and specificity of a test are generally considered to be fixed and yield different predictive values, depending on the prevalence of disease in the population you are examining. This can be used to your advantage, based on the information you want to generate. For example, you may elect not to test for a condition where the prevalence of the disease is low because it may be very difficult to interpret a positive test result.
However, it may be very important if you receive negative test results, and therefore, in that instance, the value of a negative confirms your belief of nondiseased. As the prevalence of disease increases, so does the positive predictive value. Conversely, as the prevalence decreases, the negative predictive value increases. This information is important for application of diagnostic tests in both the individual animal and in larger populations.
A large number of the tests we use are able to detect antibodies; however, our dilemma is whether that is what is currently producing the clinical signs or is it a past infection? Alternatively, is the reaction we see a cross-reaction to some other agent? Seldom do we think about the test in terms of what information it is going to provide, basing our recommendations on the prevalence of the disease in a particular practice area.
This is commonly referred to in epidemiology circles as the pre-test probability of the disease in question. Prevalence of disease is different for every area and will differ in the healthy animal versus the sick. In healthy animals, we often screen groups to determine exposure or to rule out disease; in other words, prove they are healthy. In this regard, we like a very sensitive test so there are infrequent false negatives and, therefore, we have less trouble believing a negative test result.
In the sick animal, we want the test to prove the animal is truly sick (rule-in disease). Therefore, we like to have a very specific test which reduces the number of false positives and subsequently are more inclined to believe a positive test result.
Example – bovine tuberculosis
The most common cause of tuberculosis in cattle is Mycobacterium bovis. Most often, the cattle may be infected with the organism and not exhibit clinical signs of disease. In other cases, the clinical signs may be very nonspecific and, therefore, difficult to diagnose. Mild respiratory signs are reasonably common, as this is the most frequent route of infection.
Although most cattle with tuberculosis do not exhibit clinical signs, they pose a serious health hazard to other livestock, as well as to humans. The organism may be spread via exhaled droplets, sputum, feces, milk, urine, vaginal discharge, semen and draining lymph nodes.
There are other Mycobacterium species such as M. avium and M. tuberculosis which may cause infections; however, their primary importance is in interfering with diagnostic testing in eradication programs caused by cross-reactions.
Tuberculosis has essentially been eradicated in most areas of the United States and many other countries by implementing testing programs and using less-than-perfect testing modalities.
The first test used is called the Caudal Fold Test (CFT). If the test result is either a suspect (reactor) or positive, the CFT is followed up with the short interval comparative tuberculin test (SICTT), sometimes referred to as the comparative cervical test. Following the SICTT, if the test is positive, the animal is slaughtered and tissues are collected for culture of the organism and histopathology. Follow-up testing of suspect or positive animals with further testing is called “testing in series.”
What is series testing?
In series testing, tests are conducted consecutively, based on the results of the previous test. In essence, only the animals that test positive receive further testing, which maximizes specificity and the positive predictive value. However, the downside is that in most cases when you increase specificity you decrease sensitivity and, therefore, reduce the predictive value of a negative test. Series testing places much more credence in the value of a positive test, but it is possible that more disease will be missed.
Another common expression used to explain the reasons for using series tests is asking the animal to prove it is affected. Series testing is an important part of disease eradication programs, as it helps prevent unnecessary removal of false positive animals.
An example of testing in series for tuberculosis
Currently, the prevalence of tuberculosis in the United States is 0.2 percent. The sensitivity of the caudal fold test (CFT) ranges between 68 and 95 percent, but for the purpose of this example we will use 72 percent as the sensitivity. The specificity of the CFT ranges between 96 and 98.8 percent, but for purposes of this example we will use 96 percent. The total number of animals tested in this example is 10,000. Results from the initial tests are shown in Table 1*.
In this instance, the predictive value of the positive test is 16/415=3.86 percent. If the cattle were removed from the farms and destroyed, it would result in greater than 96 percent of the cattle being removed unnecessarily. Therefore, a second test is performed on the positive animals only. This second test is utilized to enhance the positive predictive value by increasing the specificity and to reduce the number of false positive animals.
In order to increase the specificity of the testing process, the cattle are subjected to a second test procedure called the short interval comparative tuberculin test (SICTT). The SICTT has a sensitivity of between 77 and 95 percent and a specificity of greater than 99 percent. For the purpose of this example, we will use a sensitivity of 86 percent and a specificity of 99 percent. We will thus perform the second test on the positive animals from the results of the first test. We use this test to increase both sensitivity and specificity. However, due to the difficulty of performing this test, it would not be easily applied as a first-line screening test. Results of the second tests ares shown in Table 2*.
After the results of the second test, 18 animals would be removed from the population. Of the 18 animals, 14/18 (77.8 percent) would be true positives and 22.2 percent (4/18 ) would be false positive animals. However, instead of removing 399 animals unnecessarily, we would only remove four unnecessarily. There is a caveat in that we have now reduced our overall sensitivity to 70 percent (14/20), but we have increased our specificity to 9,976/9,980 (99.96 percent). The above examples are the essence of serial testing.
Although application of these tests are the basis for the tuberculosis eradication program in the United States, as you can see, they are not perfect. However, the tests have worked well in combination to produce a very successful program. These examples are a good illustration of testing in series, and the principles can be applied in the next example used in the detection of Johne’s disease.
Diagnostic testing in parallel
Parallel testing involves conducting two or more tests on an animal or group of animals. If any of the tests are positive, the animal is considered to be affected. Therefore, the second test is usually applied to the animals that were negative from the first diagnostic test.
Often, parallel testing is used in emergency situations in which a quick decision is necessary and two or more tests may be applied simultaneously. Parallel testing increases sensitivity and the negative predictive value, but reduces the specificity and the positive predictive value. Use of parallel testing essentially asks the animal to prove it is healthy. Parallel testing is useful when there is an important penalty for missing a disease (false negatives).
Example of testing in parallel
Suppose we have a quick antibody-based screening test (Test A) for a particular disease with a sensitivity of 65 percent and a specificity of 90 percent. Another antibody-based test (Test B) that is available, but more expensive, is used when you want to increase the sensitivity (sensitivity=90 percent; specificity=90 percent). For the purpose of this illustration, the total number of animals tested is 1,000 and the prevalence of this particular disease is 10 percent. Results of this example are shown in Table 3*.
From this initial screening test, the predictive value of a positive test was 65/155=41.9 percent and the negative predictive value was 810/845=95.9 percent. We would then incorporate the new test for those samples that were negative, using the screening test; results are shown in Table 4*.
The second test has resulted in an additional 31 true positive animals and 81 false positive animals. After completion of the second test, the final result of the two tests for this particular disease are shown in Table 5*.
After both tests have been completed, the overall sensitivity has increased to 96/100=96 percent and the overall specificity has decreased to 729/900=81 percent. In addition, the positive predictive value of the test process (41.9 percent) has been reduced to 36 percent with the overall testing of these samples. In contrast, the negative predictive value has increased from 95.9 percent to 99.5 percent. We thus have more confidence in the value of a negative test and less confidence in the value of a positive test.
Depending on the nature of the testing modalities used, the change in positive predictive value may not occur. For example, where the main test or the comparative test is detection of antigen (such as culture), the specificity is 100 percent and, therefore, the positive predictive value will remain at 100 percent, regardless of how many times you repeat the test. However, the sensitivity will increase. The following example will demonstrate this concept.
There is a problem with the sensitivity of tests used for diagnosis of Johne’s disease because the tests are dependent upon the stage of infection with M. paratuberculosis. Different procedures are sometimes utilized on the same sample in order to increase the detection of the organism (sensitivity).
For example, suppose we have a 1,000-cow dairy herd with a prevalence of 10 percent infection with M. paratuberculosis. In this example, we will use conventional fecal culture as the initial test (sensitivity=16.5 percent; specificity=100 percent), followed by the two-stage system using a two-step centrifugation technique (sensitivity=29.4 percent; specificity=100 percent). The numbers would be illustrated as shown in Table 6*.
We would then apply the second test (two-stage) to the samples that were negative in the first test (conventional). The sample results would be as shown in Table 7*.
Using the above examples, the overall testing results are compiled in Table 8*.
Therefore, the use of parallel testing in this example increased the overall sensitivity to 41 percent from 16.5 percent; the specificity did not change and the negative predictive value increased from 91.6 to 93.8 percent. PD
References omitted due to space but are available upon request.
—From Ohio State University, College of Veterinary Medicine
*Figure and Tables omitted but are available upon request to firstname.lastname@example.org.
William Saville, Extension Epidemiologist, Department of Veterinary Preventive Medicine, Ohio State University