However, methods to evaluate the clinical-ethical abilities of medical students, post-graduate trainees, and practising physicians are not well developed. By clinical-ethical ability, we mean the ability to identify, analyze and attempt to resolve ethical problems arising in the practice of medicine. Several evaluation methods have been used, including multiple-choice and true/false questions (Howe and Jones, 1984), case write-ups (Siegler et al, 1982; Doyal et al, 1987; Redmon, 1989; Hebert et al, 1990), audio-taped interviews with standardized patients (Miles et al, 1990), and instruments based on Kohlberg's cognitive moral development theory (Self et al, 1989).
The reliability and validity of these methods have seldom been examined. Of particular concern is the relevance of these evaluation methods to actual clinical practice. To develop a clinically sensible method to evaluate clinical-ethical abilities, we applied the methodology of the OSCE (Cohen et al, 1991).
The individual stations have adequate inter-rater reliability. The mean inter-rater reliability (intraclass correlation coefficient) of ten ethics stations in the 1992 and 1993 EFPO OSCEs was 0.66.
The face/content validity of the stations is supported by the method we used to develop them (see below, "Development and Implementation"). Rather than asking "experts" to state whether our scoring criteria appeared valid, we videotaped the performances of expert clinicians in the actual standardized patient roles. The scoring criteria for the stations are based on these performances, as well as the input of a single clinician-bioethicist. In the future, it would be desirable to enhance the face/content validity of the scoring criteria of our stations by having them reviewed and modified by an interdisciplinary expert panel (Arnold, 1993).
To examine the construct validity of the ethics OSCE stations, we hypothesised that residents would score higher than medical students. We tested this hypothesis in the 1992 EFPO OSCE, and it was confirmed (F=2.24, p=0.046). This finding lends some support for the construct validity of the ethics OSCE stations.
Since there is no accepted "gold standard" for ethical behaviour, we could not examine the criterion validity of the ethics OSCE.
The primary psychometric characteristic limiting the ethics OSCE is internal consistency reliability of scores across stations. Across the six ethics stations in the 1992 EFPO OSCE, the internal consistency reliability (Cronbach's alpha) was 0.46. Using the Spearman-Brown Prophecy formula, we can calculate that it would likely require 28 stations to provide a reliable (Cronbach's alpha ò 0.8) overall ethics score. To examine the possibility that a reliable overall score could be obtained for a subdomain of bioethics, we included four stations on decisions to forgo treatment in the 1993 EFPO OSCE. Internal consistency reliability of scores across the four stations (Cronbach's alpha) was 0.28. By calculation using the Spearman-Brown Prophecy formula, to achieve an internal consistency reliability of à = 0.8, 41 stations (almost 7 hours of testing time) would be required.
In summary, the ethics OSCE has adequate inter-observer reliability, face/content validity, and construct validity. The face/content validity could be improved through review and modification of the scoring criteria by an interdisciplinary expert panel. The internal consistency reliability of scores across stations is inadequate. This problem is not unique to the ethics OSCE; internal consistency reliability is limiting psychometric characteristic inherent in the OSCE methodology itself.
The primary disadvantage is the major psychometric limitations of the ethics OSCE -- low internal consistency reliability of scores across stations, even when the examination is focussed on a sub- domain of bioethics.
We therefore recommend a multi-method approach to the evaluation of bioethics. The examinations should include OSCE stations with standardized patients. This lends validity to the evaluation because it examines clinical skills and interactions with patients -- this is what we want to measure. The examinations should also include other evaluation methods, such as multiple choice or short answer questions -- to boost the reliability of the overall exam.
We have developed not only these 14 cases, but also a method to develop ethics OSCE cases. This will be especially useful for those who wish to develop their own cases. In brief, the cases were developed as follows. Based on cases described to us by colleagues, or actual legal cases, we drafted instructions to the candidate and a script for the standardized patient. We reviewed each case to identify key concepts that candidates would be expected to understand; prompts were built into the standardized patients' scripts to ensure that the candidate would have an opportunity to demonstrate knowledge of these concepts. Standardized patients, chosen to match the age and gender of the patient in the case, were trained to portray the cases accurately; special emphasis was placed on the consistent use of correctly timed prompts.
Candidates received an ethics score for each station. The ethics score was based on specific 8-10 item checklists developed for each station. To develop the ethics checklists, we videotaped the performances of about 5 staff physicians, who played the role of the candidate and interacted with the standardized patient, in each of the stations. We then reviewed and transcribed the videotapes and identified the comments most commonly mentioned by the attending physicians. Those comments that were commonly mentioned and, in the opinion of a clinician-bioethicist, consistent with the key bioethical concepts tested by the station, became items on the ethics checklist. The draft checklists were pilot tested. Each item on the checklist is marked as "done" or "not done", and the scores are transformed to percentages.
In conclusion, we have applied the OSCE technology to evaluation of bioethics. This booklet describes our 14 stations. The accompanying videotape portrays the scenarios. Because of the low internal consistency reliability of the ethics OSCE, we recommend a multi-method approach to the evaluation of bioethics. Although the focus of this chapter has been evaluation, in our experience, these ethics cases using standardized patients are even more useful for teaching bioethics to medical students and post-graduate trainees (Pellegrino et al, 1990). This may turn out to be their most fruitful use.
Baylis F, Downie J. "Undergraduate medical ethics education: A survey of Canadian medical schools." London, ON: Westminster Institute, 1990.
"Biomedical Ethics Committee approves document on postgraduate teaching of biomedical ethics." Annals of the Royal College of Physicians and Surgeons of Canada 1989; 22: 560.
Cohen R, Singer PA, Rothman AI, Robb A. "Assessing competency to address ethical issues in medicine." Academic Medicine 1991; 66: 14-5.
Doyal L, Hurwitz B, Yudkin JS. "Teaching medical ethics symposium: Medical ethics and the clinical curriculum: a case study." Journal of Medical Ethics 1987; 13: 144-149.
Hebert P, Meslin EM, Dunn EV, Byrne N, Reid SR. "Evaluating ethical sensitivity in medical students: using vignettes as an instrument." Journal of Medical Ethics 1990; 16: 141-145.
Howe KR, Jones MS. "Techniques for Evaluating Student Performance In a Preclinical Medical Ethics Course." Journal of Medical Education 1984; 59: 350-352.
Miles SH, Lane LW, Bickel J, Walker RM, Cassel CK. "Medical ethics education: Coming of age." Academic Medicine 1989; 64: 705-14.
Miles SH, Bannick-Mohrland S, Lurie N. "Advance-treatment planning discussions with nursing home residents: pilot experience with simulated interviews." Journal of Clinical Ethics 1990; 2: 108-112.
Pellegrino ED, Siegler M, Singer PA. "Teaching clinical ethics." J Clinical Ethics 1990; 1 (3): 175-80.
Redmon RB. "A medical ethics project for third-year medical students." Academic Medicine 1989; 64: 266-270.
Scott CS, Barrows HS, Brock DM, Hunt DD. "Clinical behaviors and skills that faculty from 12 institutions judged were essential for medical students to acquire." Academic Medicine 1991; 66: 106-11.Self DJ, Wolinsky FD, Baldwin DC. "The Effect of Teaching Medical Ethics on Medical Students' Moral Reasoning." Academic Medicine 1989 (December): 755-59.
Siegler M, Rezler AG, Connell KJ. "Using Simulated Case Studies To Evaluate A Clinical Ethics Course for Junior Students." Journal of Medical Education 1982; 57: 380-385.
Singer PA, Cohen R, Robb A, Rothman AI. "The ethics objective structured clinical examination (OSCE)." J Gen Intern Med 1993; 8: 23-8.
Singer PA, Robb A, Cohen R, Norman G, Turnbull J. "Evaluation of a multicentre ethics objective structured clinical examination." J Gen Intern Med 1994, in press.
Subcommittee on Evaluation of Humanistic Qualities in the Internist, American Board of Internal Medicine. "Evaluation of humanistic qualities in the internist." Am Intern Med 1983; 99: 720-4.