Editorial and reviews 
 Viewpoint Volume 354, Number 9191 13 November 1999

Why we need large, simple studies of the clinical examination: the problem and a proposed solution

Finlay A McAlister, Sharon E Straus, David L Sackett, on behalf of the CARE-COAD1 Group*

The problem stated
A possible solution
References

The problem stated

All physicians recognise (and most acknowledge) the importance of the initial clinical examination (history and physical) in the care of patients. The examination has several vital functions, in addition to establishment of rapport. For example, it usually provides a diagnosis: 88% of all diagnoses achieved in primary care1 and 73% in a general medicine clinic2 are established by the end of the initial history and physical examination. The initial clinical examination also allows us to identify the severity of symptoms, determine prognosis, and monitor therapy. Since the results of the initial clinical assessment allow us to modify our pretest calculations of probability of disease in our patients and to tailor subsequent investigations appropriately, a thoughtful examination can improve efficiency and lower the costs of care. The clinical examination is usually crucial in detection of disease in symptom-free patients (eg, hypertension) at an early, more easily treatable stage.

The specificity of signs (eg, right lower quadrant tenderness in appendicitis) and symptoms (eg, chest pain in coronary heart disease) tend to decrease as patients are referred from primary to secondary to tertiary care.3 This explains and to an extent justifies early definitive investigations. However, there are strong arguments against devaluing the examination and proceeding directly to definitive investigations, especially in teaching institutions, most of whose trainees need to be prepared for careers in primary care. Furthermore, definitive investigations are available to only a small fraction of patients worldwide, and the opportunity costs of definitive investigations are high in most settings and prohibitive in many. In addition, the interpretation of definitive investigations usually depends on the results of the clinical examination. For example, for a 45-year-old woman with 1·8 mm ST depression on exercise electrocardiography the likelihood of clinically important coronary stenosis varies twenty-fold, from less than 5% to more than 80%, depending on her clinical history.4 Definitive investigations may not be as definitive as we think, as shown by the lack of agreement between expert histopathologists on whether biopsies show aggressive breast cancer, piecemeal necrosis of the liver, or invasive melanoma.5 Indeed, the clinical examination sometimes even tells us whether or not to believe the result of a definitive investigation. For example, a quick and simple examination for nine features of deep venous thrombosis can identify subsets of patients with extremely low (3%) and high (75%) chances of clinically important deep venous thrombosis.6 As a result, we would not believe a positive compression ultrasound in the former group, nor a negative ultrasound in the latter, but would insist that both groups undergo venography.6

Despite the central importance of the history and physical examination to the clinical process, their accuracy and precision have rarely been assessed rigorously. The scientific criteria used by evidence-based journals to assess the validity of studies of diagnostic tests are outlined in panel 1.7 The accuracy of a symptom or sign refers to the degree to which it reflects the attribute that it is said to represent. Accuracy can be described in terms of sensitivity (the proportion of patients with the target disorder who have the symptom or sign), specificity (the proportion of patients without the target disorder who do not have the symptom or sign), or likelihood ratios (LR).8 The LR expresses the probability that a given finding will occur in a patient with, as opposed to without, the target disorder. The key advantage of LRs is that they can be generated for different degrees of intensity of a finding, and for its presence or absence. LRs greater than 10 virtually rule in a diagnosis, LRs of less than 0·1 virtually rule it out, and LRs of around 1 mean that no useful information has been obtained from the clinical finding.7

Panel 1: Levels of evidence for studies of diagnosis*
1 An independent, masked comparison with reference standard among an appropriate population of consecutive patients.
2 An independent, masked comparison with reference standard among non-consecutive patients or confined to a narrow population of study patients
3 An independent, masked comparison of an appropriate population of patients, but reference standard not applied to all study patients
4 Reference standard not applied independently or masked.
5 Expert opinion with no explicit critical appraisal, based on physiology, bench research, or first principles.
*Level 1=most rigorous, level 5=least rigorous.

Precision is another key measure of clinical skill, which describes the degree of agreement between different observers (or the same observer at different times) performing and interpreting a test. Since some agreement would be expected to occur by chance alone, precision is usually corrected for chance and expressed by the (kappa) statistic. By convention, a (kappa) statistic of less than 0·4 shows poor agreement, (kappa) of 0·4-0·6 shows fair agreement, 0·6-0·8 moderate agreement, and greater than 0·8 excellent agreement.

To take chronic obstructive airways disease as an example, a scan of 21 textbooks of physical examination published between 1957 and 1998 (list available from FAM) yielded 40 different physical signs recommended for use in differentiating patients with chronic obstructive airways disease from those with normal pulmonary function. Athough one book described 16 signs, the median number of signs was nine, and no textbook reported the precision or accuracy of these signs. We undertook a literature search of Medline 1966-98 and Embase 1980-98 using the following medical subject headings: EXP medical history taking, or EXP physical examination, or EXP clinical examination, or EXP signs; and EXP diagnosis or EXP sensitivity or EXP specificity; and EXP lung diseases--obstructive or EXP airway limitation or EXP airway obstruction. No language restrictions were used. The titles and abstracts of the 224 identified articles (and the full text of those judged to be potentially eligible) were reviewed by FAM and SES for potential inclusion. The reference lists of included articles and textbooks were searched, and experts in the field were contacted to identify other relevant articles. We identified 29 relevant articles on the clinical detection of chronic obstructive airways disease (list available from FAM). A total of 32 clinical signs were tested (median 1 sign [range 1-13] per study) with a median of only two physicians examining a median of less than 100 patients. Moreover, only one of these studies9 could be classified as a level 1 study (panel 2).

Panel 2: Clinical signs of chronic obstructive airways disease tested (% of studies)
Inspection: Prolonged expiration (31%), accessory muscle use (24%), blowing out a match test (17%), barrel chest (14%), retraction of lower ribs on inspiration (Hoover's sign [10%]), suprasternal/supraclavicular/intercostal indrawing (10%), loss of bucket handle movement of upper ribs (10%), jugular venous filling during exhalation (7%), reduced chest expansion (7%), movement of chest en-bloc (7%), increased anterioposterior diameter (7%), epigastric indrawing (3%), short neck due to elevated shoulders (3%), horizontal ribs (3%), abdominal and latissimus dorsi muscles contract on exhalation (3%), spinal kyphosis (3%), horizontal ribs (3%).
Palpation: Increased tracheal descent during inspiration (14%), low lying thyroid (10%), minimal or absent cardiac impulse (10%), reduced fremitus (3%), cardiac impulse best felt in epigastrium (3%).
Percussion: Hyperresonance (17%), reduced diaphragmatic excursion (7%), diminished cardiac dullness (7%), low lying diaphragm (3%), diminished liver dullness (3%).
Auscultation: Wheezes (28%), reduced breath sounds in the chest (28%), crackles (24%), other adventitious sounds (3%), increased upper-airway sounds (3%).

The physical signs most often cited in these textbooks and studies were: reduced breath sounds on auscultation (22 of 50 authors), hyperresonance (21), wheezes (21), and prolonged expiration (18). Of the studies testing the four most commonly cited signs, less than one in seven reported their precision and less than a third determined their accuracy (table 1).


Sign Precision   Accuracy          
  % studies Inter-observer % studies Positive Negative Sensitivity (%) Specificity (%)
  reporting k statistic reporting likelihood ratio likelihood ratio      
Reduced breath sounds 109-11 0·23-0·47 249,11,17-21 1·9-16·3 0·4-0·8   29-100 79-96
Hyperresonance 710,11 0·04-0·43 1411,12,17,22 2·9-5·3 0·5-0·7   30-58 86-94
Wheezes 149-11,17 0·43-0·70 249,11,17-19,21,23 0·9-(infinity) 0·5-1·2   9-100 37-100
Prolonged expiration 109,11,13 0·23-0·81 319,11,13-19 0·9-15·2 0·1-1·0   6-92 43-99
Table 1: Range of operating characteristics for clinical signs in prediction of obstructive airways disease

Moreover, because of the small size of these studies, the CIs around their measures of precision and accuracy are huge, which renders their usefulness indeterminate (table 1). For example, although respiratory examiners usually do better than the expert pathologists described above, their degree of agreement is in the "poor" category for these commonly quoted signs. The accepted belief that diagnosis depends on combinations of symptoms, signs, and "art" is not supported by these studies. Three of them9,11,12 assessed the precision of the overall clinical impression and documented between-clinician agreements that were no better ((kappa) 0·36-0·48) than those for the individual clinical signs. Indeed, in a study of within-clinician agreement on the respiratory examination (the same physician examining the same patient twice), Mulrow and colleagues10 showed that physicians disagreed with their own previous examination in up to 25% of cases. Similarly, the accuracy of these popular signs for chronic obstructive airways disease varies greatly between studies, and none are diagnostic on their own (table 1). Also, although the overall clinical impression is accurate in some studies (69-95%), its sensitivity (50-64%), specificity (64-93%), and likelihood ratios (1·4-7·3 for a positive impression, 0·4-0·8 for a negative impression) vary sharply between studies and fall far short of an ideal diagnostic test.9,11,12,18,21,24,25

So that the reader is not left with the false impression that these limitations of the literature are unique to chronic obstructive airways disease, table 2 shows the amount of high quality literature on various other elements of the clinical examination identified by systematic reviews published in the Rational Clinical Examination series. This series was launched in 1992 in JAMA and aimed to summarise the literature on various aspects of the clinical examination, and to draw attention to those areas in which there was a paucity of evidence. 30 articles (including 26 systematic reviews) have now been published in that series (list available at http://www.sgim.org/interestgroups/clinexam.html), but this number is only about 50% of the number of reviews originally commissioned. The editors of the series pointed out that the other reviews were not completed because investigators were "unable to locate evidence about their topic or the identified evidence was of low quality".38 In fact, even some of the published reviews39,40 were unable to find any level 1 studies, and 16 (62%) of the reviews showed major gaps in the literature within their subject area. The level 1 studies identified were generally small and inconclusive (table 2).


Detection at clinical examination High-quality studies Level 1 accuracy Median number of Median number of
  of precision*     clinicians per level 1 study patients per level 1 study
General
Malnourishment26 2   2 3   152
Deep venous thrombosis27
by individual elements of the clinical examination 0   4 3   101
by clinical prediction guide 0   2 9   561
Sinusitis28 2   2 2   206
Cardiovascular
Assessment of central venous pressure29 1   3 5   62
Aortic systolic murmur30 4   1 1   781
Left-sided heart failure31 3   9 4(dagger)   200
Acute myocardial infarction32 3   7 4(dagger)   492
Renal artery stenosis33 0   3 3   118
Abdominal aortic aneurysm34 0   15 2   168
Abdominal
Assessment of vertical liver span35 2   4 2   58
Ascites36 2   3 3   63
Splenomegaly37 2   5 2   99
Studies graded "level 1" by systematic review authors have been independently verified and data abstracted for construction of this table. *Must include two or more independent masked observers assessing an appropriate range of patients with that disorder. (dagger)Includes secondary analyses of databases constructed for studies not primarily designed to investigate elements of the clinical examination (such as baseline data collected for clinical trials or cohort studies).
Table 2: Volume of high-quality research on the clinical examination in adult patients

Thus, clearer information is needed on the precision and accuracy of the clinical examination, both in general and in various clinical settings. Moreover, there is very little information on how clinicians' precision and accuracy change with training or with increased clinical experience. Clearly we need large, methodologically robust studies on the history and physical examination.

Why are there no better studies in this area? Sackett and Rennie41 offered five reasons why "investigations into the precision and accuracy of the clinical examination [have] lagged behind similar studies of laboratory tests". First, these studies are challenging to design and difficult to operate (the only level 1 study in chronic obstructive airways disease9 enrolled only three patients a week). The specificity of symptoms and signs are generally low in the usual study sites (tertiary-care academic centres), and such studies are difficult to mount in primary care. Studies of precision can be threatening to authority, and, as pointed out by Fletcher four decades ago, physicians "seldom hunt in couples of equal seniority".12 Second, since a diagnosis rarely arises from one symptom or sign, it is inappropriate to try to dissect the precision and accuracy of individual signs for most diagnostic decisions, and it is beyond the methodological skills of most investigators to do the multivariate analyses required to analyse several of them simultaneously. Third, because most academic-based researchers spend little time at the bedside, it should not surprise us that they show little inclination to investigate the history or the physical examination.42 Fourth, the realities of modern clinical practice, with its pressure to see larger numbers of patients in shorter times, is a powerful disincentive to taking careful histories and doing thorough physical examinations--in many cases it is more efficient for the physician (but not for the health-care system) to order the "definitive" investigation. Fifth, such research is unpopular when it challenges authority and the tenets of the "art of medicine". Simel and Rennie38 have also pointed out that it is difficult to secure funding for this kind of research.
Top  

A possible solution

If this state of affairs is to change, we must begin high-quality studies on a very large scale (so that our estimates of precison and accuracy have narrow CIs). Moreover, such studies have to be done among real-world patients in primary, secondary, and tertiary care (to show any changes in accuracy as patients move along the referral pathway). The studies must involve the broad array of real-world clinicians at various levels of training and experience, who are the targets for what is learned.

Encouraged by the success of practice-based networks in primary care, which have shown that geographically dispersed clinicians can do rigorous research with the help of a central coordinating centre43-46 we have started to create an international collaborative group to design and run large, simple studies of the accuracy and precision of the clinical examination. By "simple" studies, we borrow from the terminology of clinical trialists to imply studies that enrol unselected consecutive patients and collect only a minimum of data.47 By recruitment via our centre's website, through e-mail discussion groups (such as "evidence-based health"), and by word of mouth, we have recruited over 300 clinicians from 26 countries to form the CARE (Clinical Assessment of the Reliability of the Examination) interest group. CARE is open to clinicians at any stage of training or experience and in any setting, and is structured so that any member can nominate symptoms or signs for assessment and broadcast them to the rest of the group by e-mail. Members who share that interest will design and debug the protocol, enrol patients in the study (members are expected to adhere to local clinical and ethical practice), and report their results via the internet (with instantaneous data checking and editing) to the study coordinating centre. Results will be analysed, disseminated to the investigators, and synthesised into reports with multiple authors.

To test the feasibility of this idea, we nominated a study of the accuracy of six clinical items in diagnosis of chronic obstructive airways disease, validated against independent blind spirometry at the same visit (COAD-1). 26 clinicians (or groups) from 14 countries quickly joined the pilot study, and after deciding on the protocol they enrolled 332 patients in 5 weeks (over 20 times the rate achieved in the only other high-quality study of the disorder). No protocol violations were apparent from the data or from communication with the investigators. After the study was closed, a random sample of investigators was asked to fax through the original data-collection sheets, and checks with the database showed that the error rate was less than 2%. Preliminary analyses are being made by the collaborators, and have already led to the design of the next study in the series.

We intend to expand CARE to a size and scope such that large (>100 clinicians enrolling >1000 patients), simple (<2 min per patient and <15 patients per participating clinician), and fast (<2 weeks, with data entry and instant editing via the internet) studies can be made of the clinical examination. This essay is an invitation to clinical colleagues around the world to join our fledgling enterprise to better our knowledge and performance of the clinical examination. Full details of the CARE interest group and current projects are available at: http://www.carestudy.com.

CARE-COAD1 Investigators--A Baeza, X Cea (Universidad de la Fontera, Temuco, Chile); C Baicus (N Gh Lupu Hospital, Bucharest, Romania); M Bermudez-Gomez, R Dennis (Pontificia Universidad Javeriana, Bogota, Colombia); E Etchells (The Toronto Hospital, Toronto, Canada); O Gajic (New York Methodist Hospital, Brooklyn, USA); J Ibarra (Vitoria, Spain); A Jelani (King Fahd National Guard Hospital, Riyadh, Saudi Arabia); M Kljakovic (General Practice Department, Wellington School of Medicine, New Zealand); De Londono (Bogota, Colombia); F McAlister, D Sackett, S Straus (John Radcliffe Hospital, Oxford, UK); M Molinari, M Urtasun (Hospital Regional Ushuaia, Ushuaia, Argentina); E M Mutzig (University of Oklahoma College of Medicine, Tulsa, USA); D Newberry (Ashford Hospital, London, UK); A Ramos (Clinica Puerta de Hierro, Universidad Autonoma de Madrid, Madrid, Spain); D Ross (Chorley, UK); A Ruiz (Hospital San Ignacio, Bogota, Colombia); S Salah (Al Ain Hospital, Al Ain, United Arab Emirates); I Scott (Princess Alexandra Hospital, Brisbane, Australia); P Sestini (Institute of Respiratory Diseases, University of Siena, Italy); K Sharma (Ottawa Hospital, Canada); D Teysseyre (St Sulpice de Royan, France).

Acknowledgments

We thank Frank Lederle for providing unpublished information from some of the studies on detecting abdominal aortic aneurysm in table 2. FAM is supported by the Medical Research Council of Canada. SES and DLS are supported by the NHS Research and Development Programme, UK.
Top  

References

1 Crombie DL. Diagnostic process. J Coll Gen Pract 1963; 6: 579-89.

2 Sandler G. The importance of the history in the medical clinic and the cost of unnecessary tests. Am Heart J 1980; 100: 928-31.

3 Patel A, Sackett DL. The referral forces that raise prevalence also lower specificity. Clin Res 1992; 40: 370.

4 Diamond GA, Forrester JS. Analysis of probability as an aid in the clinical diagnosis of coronary-artery disease. N Engl J Med 1979; 300: 1350-58.

5 Fleming KA. Evidence-based pathology [EBM note]. Evidence Based Med 1997; 2: 132.

6 Wells PS, Anderson DR, Bormanis J, et al. Value of assessment of pretest probability of deep-vein thrombosis in clinical management. Lancet 1997; 350: 1795-98.

7 Sackett DL, Richardson WS, Rosenberg W, Haynes RB. Evidence-based medicine: how to practice and teach EBM. London: Churchill Livingstone, 1997.

8 Sackett DL. A primer on the precision and accuracy of the clinical examination. JAMA 1992; 267: 2638-44.

9 Holleman DR, Simel DL, Goldberg JS. Diagnosis of obstructive airways disease from the clinical examination. J Gen Intern Med 1993; 8: 63-68.

10 Mulrow CD, Dolmatch BL, Delong ER, et al. Observer variability in the pulmonary examination. J Gen Intern Med 1986; 1: 364-67.

11 Badgett RG, Tanaka DJ, Hunt DK, et al. Can moderate chronic obstructive pulmonary disease be diagnosed by historical and physical findings alone? Am J Med 1993; 94: 188-96.

12 Fletcher CM. The clinical diagnosis of pulmonary emphysema--an experimental study. Proc R Soc Med 1952; 45: 577-84.

13 Schapira RM, Schapira MM, Funahashi A, McAuliffe TL, Varkey B. The value of the forced expiratory time in the physical diagnosis of obstructive airways disease. JAMA 1993; 270: 731-36.

14 Lal S, Ferguson AD, Campbell EJM. Forced expiratory time: a simple test for airways obstruction. BMJ 1964; i: 814-17.

15 Scheinhorn DJ. Screening test for airflow limitation. South Med J 1982; 75: 434-38.

16 Kern DG, Patel SR. Auscultated forced expiratory time as a clinical and epidemiologic test of airway obstruction. Chest 1991; 100: 636-39.

17 Godfrey S, Edwards RHT, Campbell EJM, Armitage P, Oppenheimer EA. Repeatability of physical signs in airways obstruction. Thorax 1969; 24: 4-9.

18 Hepper NG, Hyatt RE, Fowler WS. Detection of chronic obstructive lung disease: an evaluation of the medical history and physical examination. Arch Environ Health 1969; 19: 806-13.

19 Van Schayk CP, van Weel C, Harbers HJM, van Herwaarden CLA. Do physical signs reflect the degree of airflow obstruction in patients with asthma or chronic obstructive pulmonary disease? Scan J Prim Health Care 1991; 9: 232-38.

20 Nairn JR, Turner-Warwick M. Breath sounds in emphysema. Br J Dis Chest 1969; 63: 29-37.

21 Pardee NE, Winterbauer RH, Morgan EH, Allen JD, Olson DE. Combinations of four physical signs as indicators of ventilatory abnormality in obstructive pulmonary syndromes. Chest 1980; 77: 354-58.

22 Marini JJ, Pierson DJ, Hudson LD, Lakshminarayan S. The significance of wheezing in chronic airflow obstruction. Am Rev Respir Dis 1979; 120: 1069-72.

23 King DK, Thompson BT, Johnson DC. Wheezing on maximal forced exhalation in the diagnosis of atypical asthma: lack of sensitivity and specificity. Ann Intern Med 1989; 110: 451-55.

24 Surprenant EL, Vance JW. Evaluation of methods for the early detection of chronic obstructive ventilatory diseases. Dis Chest 1967; 52: 760-66.

25 Mannino DM, Etzel RA, Flanders WD. Do the medical history and physical examination predict low lung function? Arch Intern Med 1993; 153: 1892-97.

26 Detsky A, Smalley PS, Change J. Is this patient malnourished? JAMA 1994; 271: 1114-20.

27 Anand SS, Wells PS, Hunt D, Brill-Edwards P, Cook D, Ginsberg JS. Does this patient have deep vein thrombosis? JAMA 1998; 279: 1094-99.

28 Williams JW, Simel DL. Does this patient have sinusitis? Diagnosing acute sinusitis by history and physical examination. JAMA 1993; 270: 1242-46.

29 Cook DJ, Simel DL. Does this patient have abnormal central venous pressure? JAMA 1995; 275: 630-34.

30 Etchells E, Bell C, Robb K. Does this patient have an abnormal systolic murmur? JAMA 1997; 277: 564-71.

31 Badgett RG, Lucey CR, Mulrow CD. Can the clinical examination diagnose left sided heart failure in adults? JAMA 1997; 277: 1712-19.

32 Panju AA, Hemmelgarn BR, Guyatt GH, Simel DL. Is this patient having a myocardial infarction? JAMA 1998; 280: 1256-63.

33 Turnbull JM. Abdominal bruits: is listening for abdominal bruits useful in the evaluation of hypertension? JAMA 1995; 274: 1299-301.

34 Lederle FA, Simel DL. Does this patient have an abdominal aortic aneurysm? JAMA 1999; 281: 77-82.

35 Naylor CD. Physical examination of the liver. JAMA 1994; 271: 1859-65.

36 Williams JW, Simel DL. Does this patient have ascites? How to divine fluid in the abdomen. JAMA 1992; 267: 2645-48.

37 Grover SA, Barkun AN, Sackett DL. Does this patient have splenomegaly? JAMA 1993; 270: 2218-21.

38 Simel DL, Rennie D. The clinical examination: an agenda to make it more rational. JAMA 1997; 277: 572-74.

39 Whited JD, Grichnik JM. Does this patient have a mole or a melanoma? JAMA 1998; 279: 696-701.

40 Goldstein LB, Matchar DB. Clinical assessment of stroke. JAMA 1994; 271: 1114-20.

41 Sackett DL, Rennie D. The science of the art of the clinical examination. JAMA 1992; 267: 2650-52.

42 Petersdorf R. Is the establishment defensible? N Engl J Med 1983; 309: 1053-57.

43 Niebauer L, Nutting PA. Primary care practice-based research networks active in North America. J Fam Pract 1994; 38: 425-26.

44 Green LA, Hames CG, Nutting PA. Potential of practice-based research networks: experiences from ASPN. J Fam Pract 1994; 38: 400-06.

45 Froom J, Culpepper L, Grob P, et al. Diagnosis and antibiotic treatment of acute otitis media: report from International Primary Care Network. BMJ 1990; 300: 582-86.

46 Nutting PA. Practice-based research networks: building the infrastructure of primary care research. J Fam Pract 1996; 42: 199-203.

47 Yusuf S, Held P, Teo KK. Selection of patients for randomized controlled trials: implications of wide or narrow eligibility criteria. Stat Med 1990; 9: 73-86.