The use of rating scales in the evaluation of secondary teacher performance has been called into question and widely criticized. Of particular concern has been the use of student ratings of teacher performance. A review of instruments and practices used in the rating process reveals serious design flaws that account for the criticisms leveled against the use of rating scales. This study sought to address the limitations evident in previous rating efforts by utilizing a combination of design methodologies and measurement models including elements of Classical Test Theory (CTT), factor analysis, and Item Response Theory (IRT). The IRT model employed was the one-parameter logistic model also known as the Rasch model. Twelve scales were developed consisting of a total of ninety-two items. These scales were developed to facilitate student ratings of secondary level teachers of religion in the Church Educational System (CES) of the Church of Jesus Christ of Latter-day Saints (LDS). In addition to exploring rating scale design methodology and scale performance, this study also examined a potential threat to the validity of decisions based on ratings referred to as halo effect. Using a variety of approaches to operationally define and estimate halo error, the extent to which male and female students exhibit differing degrees of halo in their ratings of teachers was examined. The results of the study revealed that of the twelve teacher traits hypothesized in the design of the rating scales, only three met defensible criteria based on CTT and Rasch model standards: the Student-Teacher Rapport Scale (STRS), the Scripture Mastery Expectation Scale (SMES), and the Spiritual Learning Environment Scale (SLES). Secondary students were unable to meaningfully discriminate between all twelve traits. Traditional approaches to halo effect estimation suggest that males exhibited halo to a greater degree than females, whereas Rasch model approaches to halo effect estimation were less consistent. Considered together, however, the evidence suggests differential halo error by gender, with males exhibiting halo to a greater degree than females. The implications of these findings for teacher evaluation, instructional design, and future research efforts are also addressed.



rating scales, teacher evaluation, halo effect, scale construction, Rasch model