Abstract

Many-facets Rasch modeling (MFRM) is an effective way to compensate for rater effects (e.g., rater leniency/severity) in the ratings assigned during rater-mediated assessment. While a fully crossed MFRM rating design, in which all raters rate every rating object, is the ideal from a psychometric standpoint; fully crossed designs are rarely implemented in practice because of the resource burden they impose on assessment personnel. Thus, incomplete rating designs, where each rater does not rate all rating objects, are the norm. However, unless assessment personnel consciously attend to a variety of aspects of rating design, incomplete rating designs may suffer from design flaws that can cause problems for the MFRM analysis of the ratings, making fair comparisons across ratings problematic. The literature provides minimal insight into how various aspects of rating design might affect MFRM analysis of the ratings. Therefore, it is unclear which combinations of incomplete rating design components are likely to yield the most psychometrically useful ratings, while also minimizing the resource load imposed by the design. Using many aspects of the approach pioneered by McEwen (2018), a fully crossed rater-mediated university GE assessment, with 72 student work samples (SWS) and six raters, was used to sample 75 subsets of ratings from the fully crossed ratings and conduct MFRM analysis on each subset using Linacre's FACETS software. Three main rating design attributes were varied across these subsets: (a) design type (spiral, block, random); (b) components of design structure (rater coverage, linkage, balance, repetition size); and (c) approaches to misalignment between SWS and rubric dimensions (blanks, zeros, averages computed from blanks, averages computed from zeros, and imputation of a half-credit constant). MFRM analyses were used to investigate the effects of these rating design attributes on the resulting ratings. Model reliability for raters was measured, and observed and fair average ratings from the 75 MFRM analyses were compared to the fair averages of the fully crossed design (FFAs). Visualizations were used to aid in understanding the effects of design attributes on MFRM results; and fair average versus observed average deviations from, and correlations with, the FFAs were computed. Model reliability for raters was poor across all incomplete rating designs. Consistency of incomplete design fair averages with FFAs varied widely across rating design attributes. Design attributes that yielded the most reliable ratings included: (a) block and spiral rating design types; (b) higher levels of rater coverage, linkage, and repetition size; and (c) blanks and zeros as approaches to misaligned rubric dimensions. Redundant links were identified in all incomplete designs at high rater-coverage levels; suggesting that rating designs with lower rater-coverage levels may be practically equivalent to the same designs at high coverage levels.

Degree

PhD

College and Department

David O. McKay School of Education; Educational Inquiry, Measurement, and Evaluation

Rights

https://lib.byu.edu/about/copyright/