Educational Measurement Workshop: A Sweet Approach to Understanding the Basic Principles of Educational Measurement
Medical College of Wisconsin
Peer reviewed
Increasingly, medical educators are expected to design, implement, and analyze the results of an array of learner assessment strategies ranging from multiple choice examinations to human (e.g., standardized patients) and mechanical simulations. However, many of our faculty have limited training focused on the core principles of educational measurement and learner assessment.
These principles, including reliability, validity, and sources of errors (e.g., rater biases, administration) are presented in a 2.5-3 hour workshop using chocolate as a central focus. During this memorable and engaging workshop, participants identify key attributes of excellence (in chocolate), develop rating scales, train raters, rate the chocolates, and then report their scores to examine rater consistency. At the conclusion of the workshop, the basic principles of educational measurement and the methods to control common sources of error (e.g., instrumentation) are addressed and discussed in the context of common learner assessment methods.
The learner will be able to:
- Define basic terminology associated with educational measurement and learner assessment including reliability, validity, inter-rater agreement, normative, and criterion based assessment.
- Identify common sources of measurement errors that threaten the integrity of learner assessment measures.
- Identify strategies to correct/control for errors with particular emphasis on those attributable to raters and instrumentation.
- Apply measurement principles to address medical education related learner assessment issues.
- Instructor needs strong background and medical education-related working knowledge of the basic principles of educational measurement and associated methods to control common sources of error.
-
Ability to draw on examples from a range of specialties/disciplines and training levels including facility with
- ACGME Competencies, Assessment Requirements for Accreditation and Toolkit;
- USMLE and credentialing boards (e.g., ABIM).
-
Teaching Skills for Skilled/experienced large group instructor:
- Ability to instruct using Microsoft Office PowerPoint software.
- Ability to interact and facilitate large discussion with learners across specialties.
- Ability to promote small group interaction, monitor time on task, and direct tasks as needed to stay within schedule.
- Prepare materials and room or be flexible to use established room set-up.
We use a common, universally familiar subject (chocolate) as the target of performance assessment so participants focus on the instrumentation issues. Since multiple sources of variation and error (e.g., inter-rater taste preferences, rater fatigue) exist even with as simple a construct as chocolate, we provide concrete examples of these as threats to assessment validity. Because the assessment is relatively brief, we are able to quickly summarize the data and interpret simple test statistics (e.g., inter-rater reliability) on the spot. You could have a staff person help with collection of rating forms and quick summary of data while discussion of common errors ensues.
The presenter must continually speak to the relevance and transfer of the material to medical student education. If the participants are not well known to the instructor, a brief introduction of their roles and assessment responsibilities might help make the lessons personal. Variability in chocolates will serve as sources of variation in the discussion - some anticipated, some not (like students). While participants are sampling the chocolates, instructors should identify sources of variability by the three of the four sources of error used to ground the workshop: instrumentation (clarify of rating scales); raters (fatigue, bias' towards a particular chocolate, contamination with another food/drink source); administration (clarity of instructions, order of tasting, chocolate sample size) and any other sources that will introduce error in the measurement. These observations are then used during the discussion (order, size, style of sampling, drinks).
To help learners transfer this information to their faculty roles, find relevant examples of assessment tools from one's own institution. We use a detailed Likert scale rating form in our clinical clerkship that serves as an example and have participants reflect on possible applications of this material in their own jobs.
