Prior studies suggest that raters' familiarity with test-takers' first language (L1) can be a potential source of bias in rating speaking tests. However, there is still no consensus between researchers on how and to what extent that familiarity affects the scores. This study investigates raters' performance and focuses on not only how raters' second language (L2) proficiency level interacts with examinees' L1, but also if raters' teaching experience has any effect on the scores. Speaking samples of 58 ESL learners with L1s of Spanish (n = 30) and three Asian languages (Korean, n = 12; Chinese, n = 8; and Japanese, n = 8) of different levels of proficiency were rated by 16 trained raters with varying levels of Spanish proficiency (Novice to Advanced) and different degrees of teaching experience (between one and over 10 semesters). The ratings were analyzed using Many-Facet Rasch Measurement (MFRM). The results suggest that extensive rater training can be quite effective: there was no significant effect of either raters' familiarity with examinees' L1, or raters' teaching experience on the scores. However, even after training, the raters still exhibited different degrees of leniency/severity. Therefore, the main conclusion of this study is that even trained raters may consistently rate differently. The recommendation is to (a) have further rater training and calibration; and/or (b) use MFRM with fair average to compensate for the variance.



College and Department

Humanities; Linguistics and English Language



Date Submitted


Document Type





language testing, rater bias, speaking tests, oral proficiency, language learning background, accented speech

Included in

Linguistics Commons