Measuring the Clinical Competence of Anesthesiologists
The problem in perspective
Why is so much attention directed at judging physician competence in the United States? Much of this attention has been thrust on health care practitioners by the Institute of Medicine (IOM). The IOM was chartered in the United States in 1970 by the National Academy of Sciences. According to the Academy’s 1863 charter, its charge is to enlist distinguished members of the appropriate professions in the examination of policy matters pertaining to public health and to act as an adviser to the federal government on issues of medical care, research, and education. In 1998, the Quality of Health Care in America Project was initiated by the IOM with the goal of developing strategies that would result in a threshold improvement in quality in the subsequent 10 years [1].
Toward that goal, the Quality of Health Care in America Project published a series of reports on health care quality in the United States. The first in the series was entitled To Err Is Human: Building a Safer Health System. This report on patient safety addressed a serious issue affecting the quality of our health care, specifically human error. This first report began by quoting 2 large US studies, one conducted in Colorado and Utah and the other in New York, which found that adverse events occurred in 2.9% and 3.7% of hospitalizations, respectively. In the Colorado and Utah hospitals, 8.8% of adverse events led to death, compared with 13.6% in New York hospitals. In both of these studies, more than half of these adverse events resulted from medical errors and, according to the IOM, could have been prevented [1].
When extrapolated to the more than 33.6 million hospital admissions in the United States during 1997, the results of these studies implied that 44,000 to 98,000 Americans avoidably die each year as a result of medical errors. Even when using the lower estimate, death caused by medical errors becomes the eighth leading cause of death in the United States. More people die in a given year as a result of medical errors than from motor vehicle accidents (43,458), breast cancer (42,297), or AIDS (16,516). The IOM estimated the costs of preventable adverse events, including lost income, lost household production, disability, and health care costs, to be between 17 and 29 billion dollars annually; health care expenditures constitute more than half of these costs [1].
The IOM believed that the delivery level would be the ultimate target of all their recommendations. By way of example, anesthesiology was cited as an area in which impressive improvements in safety had been made at the delivery level. The initial report of the Quality of Health Care in America Project stated, “As more and more attention has been focused on understanding the factors that contribute to error and on the design of safer systems, preventable mishaps have declined. Studies, some conducted in Australia, the United Kingdom and other countries, indicate that anesthesia mortality is about 1 death per 200,000 to 300,000 anesthetics administered, compared with 2 deaths per 10,000 anesthetics in the early 1980s [1].” The reference cited for this marked improvement in anesthesia-related mortality does not describe the study that resulted in the lower rate quoted by the IOM and does not have an author listed [2]. Some believe that the IOM’s claim of improved anesthesia mortality resulted from a study by John Eichhorn, who examined 11 cases of major intraoperative accidents that had been reported to a malpractice insurance carrier between 1976 and 1988 [3]. In an effort to remove disease and postoperative care as contributing factors, Eichhorn’s study considered only patients with an American Society of Anesthesiologists (ASA) Physical Status of I or II who died intraoperatively. Therefore, 5 intraoperative anesthesia-related deaths out of an insured population of 1,001,000 ASA Physical Status I or II patients resulted in a mortality of 1 per 200,200 in which anesthesia was considered the sole contributor [4]. In a 2002 review of the published literature, anesthesia-related mortality in less exclusive general patient populations ranged from 1 in 1388 anesthetics to 1 in 85,708 anesthetics, and preventable anesthetic mortality ranged from 1 in 1707 anesthetics to 1 in 48,748 anesthetics. When anesthesia-related death is defined as a perioperative death to which human error on the part of the anesthesia provider has contributed, as determined by peer review, then anesthesia-related mortality is estimated to be approximately 1 death per 13,000 anesthetics [3].
As noted in the corrective strategies listed earlier, error reporting and peer review were among the recommendations of the IOM report. It was believed that a nationwide, mandatory public-reporting system should be established for the collection of standardized information about adverse events that result in death or serious patient harm. Despite this aggressive approach, the Quality of Health Care in America Project also saw a role for voluntary, confidential reporting systems. Their initial report recommended that voluntary reporting systems be encouraged to examine the less severe adverse events and that these reports be protected from legal discoverability. In this model, information about the most serious adverse events that result in harm to patients, and which are subsequently found by peer review to result from human errors, would not be protected from public disclosure. For less severe events, public disclosure was not recommended by the IOM because of concerns that fear about legal discoverability of information might undermine efforts to analyze errors to improve safety [1].
In the second report from the Quality of Health Care in America Project, the IOM described the gap between our current health care system and an optimal health care system as a “quality chasm.” They went on to say that efforts to close this gap should include analysis and synthesis of the medical evidence, establishment of goals for improvement in care processes and outcomes, and development of measures for assessing quality of care. This second report also emphasized the importance of aligning payment policies with quality improvement, and changing the ways in which health professionals are regulated and accredited [5].
Licensure and certification
Obtaining an initial license to practice medicine in the United States is a rigorous process. State medical boards universally ensure that physicians seeking licensure have met predetermined qualifications that include graduation from an approved medical school, postgraduate training of 1 to 3 years, background checks of professional behavior with verification by personal references, and passage of a national medical licensing examination. All states currently require applicants to pass the United States Medical Licensing Examination (USMLE), or past equivalent. Passing the USMLE is a 3-step process. Step 1 assesses whether the applicant understands and can apply the basic sciences to the practice of medicine, including scientific principles required for maintenance of competence through lifelong learning. This assessment is in the form of an examination made up of multiple-choice questions with one best answer. Step 2 assesses the clinical knowledge and skills essential for the provision of safe and competent patient care under supervision. The clinical knowledge assessment is also in the form of an examination made up of multiple-choice questions, but the clinical skills assessment uses standardized patient models to test an applicant’s ability to gather information from patients, perform physical examinations, and communicate their findings to patients and colleagues. Step 3 assesses whether an applicant can apply medical knowledge and understanding of biomedical and clinical science in the unsupervised practice of medicine, with emphasis on patient management in ambulatory settings. This part of the USMLE also takes the form of an examination made up of multiple-choice questions. Although initial medical licensure relies heavily on examinations composed of multiple-choice questions, most agree that it is a moderately rigorous process with sufficient state oversight to assure initial physician competence and to provide a measure of valuable public protection [6].
Although the achievement of licensure to practice medicine is generally accepted as adequate assurance of initial competence, the processes in place for assessment of continuing competence have raised increasing concern among medical professionals, licensing authorities, and other interested parties, including the general public. After physicians are initially licensed, they must renew their license to practice medicine every 2 to 3 years to continue their active status. During this renewal process, physicians must show that they have maintained acceptable standards of professional conduct and medical practice as shown by a review of the NPDB, the Federation Physician Data Center, and other sources of public information held by the states. In most states, physicians must also show they have participated in a program of continuing medical education and are in good health. These criteria are often satisfied by a declaration by the physician that he or she has completed approximately 40 hours of continuing medical education over the past 2 years, and has continued in the active practice of medicine with no known physical or mental impediments to that practice. The renewal process does not involve an examination of knowledge, practical demonstration of competence, or peer review of practice [7].
Peer review
In 1986, Governor Mario Cuomo of New York State announced his plan to have physician credentials periodically recertified as part of the renewal process for medical licensure. He convened the New York State Advisory Committee on Physician Recredentialing, which subsequently recommended physicians be given 3 options for satisfying the requirements of relicensure. These options included: (1) specialty board certification and recertification, (2) examination of knowledge and problem-solving ability, and (3) peer review in accord with standardized protocols. In 1989, the New York State Society of Anesthesiologists (NYSSA) began developing a model program of quality assurance and peer review to meet the evolving requirements for the recredentialing and relicensure of anesthesiologists in New York State. In that same year, the ASA endorsed a peer review model, developed by Vitez [8,9], which created error profiles for comparison of practitioners. The NYSSA modified this model for the purpose of recredentialing and relicensure of anesthesiologists with the belief that standardized peer review was the only appropriate method for identifying patterns of human error in anesthesiologists [10]. The NYSSA hoped that a standardized peer review model would permit development of a statewide clinical profile containing the performance of all anesthesiologists practicing in the state. Conventional statistical methods would then be used to compare the clinical profiles of individual anesthesiologists with the statewide profile to identify outliers who may need remediation.
In a recent study of 323,879 anesthetics administered at a university practice using a structured peer review of adverse events, 104 of these adverse events were attributed to human error for a rate of 3.2 per 10,000 anesthetics. With this knowledge, faculty of this university practice were asked what rate of human error by an anesthesiologist would indicate the need for remedial training, and suggest incompetence. The median human error rates believed to indicate the need for remedial training and suggest incompetence were 10 and 12.5 per 10,000 anesthetics, respectively. Power analysis tells us that, if we were willing to be wrong about 1 out of 100 anesthesiologists judged to be incompetent (alpha error of 0.01) and 1 out 20 anesthesiologists judged to be competent (beta error of 0.05), then sample sizes of 21,600 anesthetics per anesthesiologist would be required [11]. Even at these unacceptably high levels of alpha and beta error, an appropriate sample size could require more than 2 decades to collect. Therefore, the concept of using human error rates to judge clinical competence is not feasible and this has implications for all database registries designed for this purpose.
Closed claims and the NPDB
The Health Care Quality Improvement Act of 1986 led to the establishment of the NPDB, an information clearinghouse designed to collect and release certain information related to the professional competence and conduct of physicians. The establishment of the NPDB was believed to be an important step by the US Government to enhance professional review efforts by making certain information concerning medical malpractice payments and adverse actions publicly available. As noted earlier, the NPDB lacks the denominator data necessary to determine individual provider error rates to judge clinical competence. Even if individual denominator data were available, malpractice closed claims data are also likely to lack the statistical power necessary to be a feasible measure of clinical competence. For example, in a study of 37,924 anesthetics performed at a university health care network between 1992 and 1994, 18 cases involved legal action directed at an anesthesia provider. An anesthesiologist was the sole defendant named in 2 malpractice claims, only one of which resulted in a $60,000 award. A single letter of intent also named an anesthesiologist as the sole defendant. In the 15 additional legal actions, an anesthesia provider was named as codefendant in 3 claims and implicated in 12 letters of intent. The incidence of all legal actions against the anesthesia practitioners in this sample was 4.7 per 10,000 anesthetics, and the single judgment against a practitioner in this sample represents a closed claims incidence of 0.26 per 10,000 anesthetics [12].
More importantly, there may be no relationship between malpractice litigation and human errors by anesthesiologists. In the sample that yielded 18 cases involving legal action, there were a total of 229 adverse events that resulted in disabling patient injuries. Of these 229 disabling patient injuries, 13 were considered by peer review to have resulted from human error, or deviations from the standard of care, on the part of the anesthesia provider. The rate of anesthetist error leading to disabling patient injuries, therefore, was 3.4 per 10,000 anesthetics. Comparison of legal action and deviations from the standard of care showed the 2 groups to be statistically unrelated. None of the 13 cases in which a disabling injury was caused by deviations from the standard of care, as determined by peer review, resulted in legal action; and none of the 18 cases involving legal action was believed to be due to human error on the part of the anesthesia provider. Therefore, closed malpractice claims lack both statistical power and face validity as a measure of competence [12].
Indicators of clinical competence and face validity
Malpractice claims are not the only indicator of clinical competence that may lack validity. The first anesthesia clinical indicators developed in the United States came from the Joint Commission (TJC), formerly known as the Joint Commission on Accreditation of Healthcare Organizations. These original 13 anesthesia clinical indicators (Box 1) were adverse perioperative events that were intended to trigger a peer review process to assess the contribution of anesthesia care. Before the release of these indicators in 1992, TJC conducted alpha testing for face validity and ease of data collection in a limited number of health care facilities. After their initial release, these indicators were subjected to beta testing, in which similar characteristics were evaluated in a broader range of health care organizations. Following the completion of the beta phase in 1993, the original 13 anesthesia clinical indicators were reduced by TJC to 5 perioperative performance indicators in an effort to make them applicable to a broader range of institutions and to emphasize that these adverse outcomes are not specific to errors in anesthesia care.
Box 1 Anesthesia clinical indicators drafted in 1992 by TJCa
a The original 13 anesthesia clinical indicators developed in the United States by TJC were reduced to 5 perioperative performance indicators* after testing for face validity and feasibility of data collection.
Similarly, a recent systematic review by Haller and colleagues [13] identified 108 clinical indicators related to anesthesia care, and nearly half of these measures were affected by some surgical or postoperative ward care. Using the definitions of Donabedian [14], 42% of these indicators were process measures, 57% were outcome measures, and 1% related to structure. All were felt to have some face validity, but validity assessment relied solely on expert opinion 60% of the time. Perhaps more disconcerting, the investigators found that only 38% of proscriptive process measures were based on large randomized control trials or systematic reviews [13].
Metric attributes
Although showing the validity of performance measures should be necessary for judging clinical competence, it may not be sufficient when these performance measures are intended to influence physician reimbursement for patient care. In 2005, a group of 250 physicians and medical managers from across the United States convened a conference to produce a consensus statement on how “outcomes-based compensation arrangements should be developed to align health care toward evidence-based medicine, affordability and public accountability for how resources are used.” This consensus statement recommended several important attributes for measures included in pay-for-performance (P4P), or value-based compensation, programs. These attributes included high volume, high gravity, strong evidence basis, a gap between current and ideal practice, and good prospects for quality improvement, in addition to the already discussed reliability, validity, and feasibility [15]