A History of the American Board of Anesthesiology Certifying Examinations

T Drysdale Buchanan, MD

New York, New York

John S Lundy, MD

Rochester, Minnesota

EA Rovenstine, MD

New York, New York

Henry S Ruth, MD

Philadelphia, Pennsylvania

H Boyd Stewart, MD

Tulsa, Oklahoma

Ralph M Tovell, MD

Hartford, Connecticut

Ralph M Waters, MD

Madison, Wisconsin

Paul M Wood, MD

New York, New York

Philip D Woodbridge, MD

Boston, Massachusetts

In 1941, the ABA became a primary board, no longer a sub-board of the American Board of Surgery. Then, as now, the Board existed primarily to credential (confirm their suitability to enter the certification process) and certify candidates. In the past two decades, the ABA assumed certification of practitioners in two subspecialties within anesthesiology (Critical Care and Pain Medicine). It developed recertification for its primary and subspecialty certificates and, most recently, the conversion of recertification to maintenance of certification in anesthesiology (MOCA). Notwithstanding these diverse activities, certification remains its primary business.

This essay will describe the evolution of the examination process for the primary certificate through to approximately 2005, the date where my direct source material ended. This process has radically changed from the first examinations in 1938 and 1939. Logistics have changed because of the dramatic growth of anesthesiology and the consequent enrollment of increasing numbers of candidates for examination. A sustained commitment of the ABA to fairness in the evolution of the process, provided an underlying platform. Fairness requires that every candidate is treated equally, but how could that be done for oral examinations by examiners with diverse qualities’some more demanding, some less demanding? How could that be done if the questions asked by different examiners differed? How does one factor into the process, the experience of the examiner or the time of day of the examination? And do the written and oral examinations test relevant knowledge and skills? This history of the ABA describes the struggle to answer these questions, and to achieve the best, if imperfect, answers.

Early Examinations

The ABA modeled the examination process on the written, oral, and practical components of the American Board of Surgery examination. Thus, the original Directors created a written essay-style examination with five basic science components: anatomy, pathology, pharmacology, physics and chemistry, and physiology [1]. In turn, each component consisted of five questions. Candidates had to answer three of the five questions in each component, within 45 minutes. Three directors on the Examinations Committee: the first ABA president Thomas Buchanan, Wood, and Emory Rovenstine, the fabled chair at Bellevue Hospital and a disciple of Ralph Waters, graded all examinations. All lived in the New York area and shuttled papers among themselves to complete the grading. As is the case today, in that first examination, graders were blinded to the name, personal information and location of the candidate. As the candidate pool grew, restricting grading to a small committee of directors became impossible, but at least one ABA director read every examination and retained final say over grading. Within a few years, Ralph Tovell and Lundy, both original directors, noted that teaching programs had obtained examination questions, and in response had changed their curricula. Some critics complained that the ABA gave the hardest basic science test of all specialties; Waters and Kenneth Woodbridge, the first Anesthesia Chair at Temple University accepted that as a compliment.

Initially, oral examinations were intended to test the candidate’s proficiency in clinical anesthesiology, but were unstructured’examiners could ask questions about any area of anesthesia they chose. Unlike the written examination, examiners reviewed the candidate’s application and list of publications before the test, and often asked the candidate “what he thought he did best.” Not surprisingly, examination content varied widely. Each candidate was tested for 10 minutes in each of three rooms with two examiners in each room. ABA directors gave the first sets of examinations, but other “Founders” soon came to help, although only the directors graded the content. Then and subsequently, no one with personal knowledge of a candidate could examine that candidate.

After satisfactory completion of the written and oral examinations, a director or Founder went to the candidate’s hospital, watched him/her provide anesthesia care, observed their interactions with patients and surgical colleagues, and even investigated whether they worked for a fee or were salaried by a hospital’salaried anesthesiologists were then considered to be acting unethically because all other physicians billed fee-for-service. On completion of the practical examination, the Board voted to award or not award a certificate to the candidate. By 1940, the ABA was already wavering on the feasibility of continuing the practicum phase of the certification process. Discussion in the ABA minutes focused on the impracticality of the exercise, particularly after the influx of candidates from World War II. There was no official end to the practical examination, nor was there mention of the last candidate so examined.

Rather than describe the evolution of all three components of the examination process simultaneously, the narrative now will follow each separately.

The Written Examination

The enormous surgical demands imposed by World War II, mandated drafting many young physicians into anesthesiology. Most of these became on-the-job trainees. For those few who were formally trained and eligible for examination after credentialing, the ABA initially provided written and oral examinations in Europe and the Far East, but that practice soon became unworkable.

In 1944, Meyer Saklad (Fig. 35.1), one of the first nine diplomates, and one of the creators of the original American Society of Anesthesiology’s Physical Status Classification, became a director of the Board and the Board’s first innovator. He questioned the reliability (the likelihood that the same candidate would get the same grade if taking a similar test at a different sitting) of essay-type examinations, arguing that the questions were too open-ended and responses too broad to establish true knowledge. Regardless, the horde of returning military anesthesiologists made the grading of essay examinations impractical. With little fanfare, the 1948 written examination of the ABA morphed into 125 multiple choice questions, and the next year into 300 questions, divided equally among the original five areas. The passing (the cut) score was arbitrarily set at 67% correct.

Fig. 35.1

Meyer Saklad. (Courtesy of the Wood Library-Museum of Anesthesiology, Park Ridge, IL.)

In 1950, administration of the written examination was decreased from twice to once a year. The multiple choice format was now firmly entrenched, although ABA minutes described the directors agonizing about teaching programs gearing their didactic material to passing such tests. When Fred Haugen of the University of Oregon succeeded Saklad as chair of the Examinations Committee, he opted to add more questions requiring reason and logic, rather than just memory. By now, the written examination had a normative passing score, i.e., a certain percentage of candidates would pass independent of an arbitrary percent correct. In 1955, Saklad questioned this approach and suggested the imposition of a minimum pass score, regardless of the success rate. Saklad challenged the Examinations Committee to evaluate the test questions, to decide which should be answered correctly by a passing candidate, but little happened with this proposal. Prompted by such controversies, the ABA contracted with the Educational Testing Service (ETS) in Princeton to provide psychometric support. Two years later, ETS recommended eliminating the rigid five content sections, and reducing the length of the test from 300 to 180 items.

A “reliability index” provides a statistical assessment of the consistency or precision of a test; i.e., the likelihood that the same candidates taking a parallel, but different examination at another time would generate the same score. It ranges from zero (no reliability) to 1.0 (perfect reliability.) The changes instituted by Saklad and ETS dramatically increased the reliability index from 0.70 in 1956, to greater than 0.90 in 1957. However, “validity” (does the test examine what you hope it does) was a more elusive goal to assess, and had to be based on “content validity.” This psychometric assessment, content validity, is based on the examination having a clearly defined content outline as the basis of its creation, and that test takers who are qualified in that domain perform better than those who are less well prepared in the content domain.

In 1960, Harvey Slocum, chair of Anesthesiology at the University of Texas Medical Branch in Galveston, became chair of the Examinations Committee. He enacted major changes in the written examination. First, the test would include 50 previously used questions from the 1957 and 1958 examinations for equating purposes. Equating would permit comparison of performance of examinees from different years on the same questions, and thereby allow psychometric conclusions about new items. Second, to allow reproducible analyses, statistical evaluations would only be made for data from first-time takers. With the benefit of these changes, the ABA went from a normative scoring system, in which a predetermined pass rate was used, to a criterion-based scoring system in which a predetermined cut score was used, resulting in a pass rate of 84% the first time it was used. But these low failure rates produced discomfort for the Board and two years later the Board returned to normative scoring.

In 1965, the examination included more equating questions and increased to 210 items to prevent dilution by previously used questions. One year later, the ABA switched from ETS to the National Board of Medical Examiners (NBME) for psychometric support. The NBME distrusted equating, causing removal of all previously used items. The cut score was again now normative with a pass rate set arbitrarily at 75% for first-time takers. Despite such arbitrariness, considerable data suggested that the written examination was performing well. The reliability index (see above) equaled 0.94 and director James Matthews noted that the current examination had never had a reliability index less than 0.86, a success level not yet achieved by any other specialty board. In that year, the “discrimination index” (comparing performance on items in aggregate by those who passed and failed the test) was 0.37; a good score is considered anything greater than 0.25. The Board was increasingly satisfied with the state of the written examination.

In 1967, two seemingly minor events changed the Board forever. An ad hoc committee of delegates from the Association of University Anesthetists, the American Society of Anesthesiologists (ASA), the AMA section on Anesthesiology, the Academy of Anesthesiology, and the ABA met to consider the possibility of an in-service examination for all US anesthesiology residents. David Little, secretary of the ABA, was the Board’s delegate. Nothing came of the effort initially, but the seed for this concept was planted. The other event was election to the Board of the then chair of anesthesiology at Baylor Medical College, Arthur Keats (Fig. 35.2). Over the next twelve years, Keats changed the Board’s written and oral examination more that any director before or since. No challenge was too great, no detail too small.

Fig. 35.2

Arthur Keats. (Wood Library-Museum of Anesthesiology, Park Ridge, IL.)

Keats became chair of the Examinations Committee in 1971, the same year that he became editor-in-chief of the journal,Anesthesiology. The committee immediately started discussions of computer-based examinations. He opined that such technology would not be appropriate for the primary examination, but “if recertification (an idea he and the rest of the Board opposed) ever came to pass, a computer-based examination might be ideal.” He accurately foresaw the future.

Also in 1971, a second discussion of an in-service examination for all US anesthesiology residents was initiated between representatives of the ASA’s American College of Anesthesiology (ACA) and the Board. For decades, the College had run a shadow certification process for anesthesiologists. Both the ACA and the ABA certified many practitioners, but the ACA also certified many who the ABA considered unqualified by training or who had failed the Board’s examination. These competing certificates were a source of confusion to hospitals and their credentialing committees. Keats and Little represented the ABA and proposed that leadership of any in-service examination for all US anesthesiology residents must remain in the hands of the ABA, that the examination should be administered annually, and that participation would be a certification requirement of the ABA. At the next ABA meeting (1972), this enormously controversial proposal was discussed. First, the ABA had reservations about creating examinations with outsiders from the ASA and, secondly, they recognized the enormity of the undertaking. Albert Betcher declared opposition to the plan if the ACA continued its certification. Keats assured the Board that if the proposed plan were enacted, the ACA would cease awarding certificates, which it did a few years later.

In 1973, an ABA/ACA Liaison Committee for In-Training Examinations was formed. Membership from the ASA included Harry Bird, Charles Coakley and William Eggers. William Hamilton, Robert Patrick, and Keats as chair represented the ABA. Note the change in title from “in-service” to “in-training,” a distinction that persists between surgery and anesthesiology to this day (surgical residents are in “servitude” and anesthesia residents are in “training”). The committee was charged with creating an examination that could be administered to all residents by 1975. Ultimately the ABA planned to use a subset of that examination as the written ABA examination. The challenge was so enormous that both organizations prepared their own examinations to use if creation of the in-training examination failed.

A key challenge for the committee was that a written certifying examination needs to test only a sample of knowledge from the field to be valid; but a useful in-training examination must be comprehensive; i.e., truly examine the total content of the domain of knowledge. No one had previously focused on the scope of the specialty. The liaison committee immediately went to work on creating a content outline (grid) that defined the entire contemporary scope of anesthesiology. Concurrently, Keats asked the NBME to help him create a new metric to assess whether the examination would test what was actually learned during residency training, as opposed to other educational experiences.

In the spring of 1975, amazingly on schedule, 1700 residents at multiple centers across the U.S. took the first in-training examination. The test had 350 questions, 140 from old ABA and ACA examinations, and 210 generated by writers solicited by Keats from US program directors. Performance on that examination would serve as the comparator for future examinations. In that year, the ad-hoc committee petitioned its two parents to change its status to a permanent council. The ABA-ASA Council on In-Training Examinations resulted. Keats (chair), Bird, Benson and Slogoff represented the ABA. Eggers, Howard Zauder, Coakley and Alan Sessler represented the ASA. As part of its charter and as a requirement of the ABA, the chair of the Council would always reside in the Board.

In the early days of the In Training Examination (ITE) Council, all examination questions were reviewed and edited by the 8 Council members. Disagreement about which of the five choices for the answer was correct, was not atypical. It then usually went like this. Keats, as chair, would let the discussion go on for several minutes during which one distinguished member would argue for answer “A,” another for answer “C,” and another for “D.” After 10 minutes, Keats would say, “Time’s up. The answer is ’E‘. Let’s move on.”

The second examination in May 1976, was also given to all Canadian residents and was equally successful. Several enhancements were added. First, the “grid” was formalized as the “Content Outline of Anesthesiology,” a description of the scope and breadth of the specialty, a document updated every several years with the appointment of a new Council chair, and provided to every anesthesia training department and resident.

Second, the Council debated over the feedback to departments and residents. In a qualifying test, only a score is required, but to have educational value, feedback had to be more meaningful. Options ranged from no feedback to providing the entire examination with answers. The latter was rejected for two reasons: all of the questions would be lost for future use and, more importantly, giving the questions and answers would suggest that those 350 items reflect all that you have to know about the domain, a flawed conclusion. The Council settled on a “keyword” feedback. The resident would receive a keyword phrase describing the essence rather than the point of the question, for each item they had answered incorrectly. This directed attention to an area of study rather than the more circumscribed correct answer to a specific question. An example might be a question dealing with drugs that do and don’t cross the placenta with the correct answer being a muscle relaxant. The “keyword” phrase might be “placental transfer of relaxants.” To avoid overwhelming junior residents with too many keyword phrases, feedback was limited to questions answered correctly by at least half of the graduating (CA3) residents. Keywords were created in advance, but could be modified if the responses to the question suggested a deficiency in knowledge different than expected. Program directors could therefore easily see what proportion of their residents incorrectly answered a question related to a specific keyword phrase, and draw conclusions about deficits in their departmental curriculum.

Only gold members can continue reading. Log In or Register to continue