Abstract

This dissertation examines the evolution and educational implementation of Automated Essay Scoring (AES) systems, with particular focus on generative artificial intelligence integration in educational assessment. Through a two-article format, this research addresses the persistent gap between technical capability and educational adoption that has characterized AES development over seven decades. The first article presents a comprehensive methodological review analyzing 122 studies spanning AES evolution from foundational computational linguistics through contemporary hybrid systems. Despite achieving strong correlations with human scoring, modern AES systems face resistance due to validity concerns and misalignment with pedagogical values, revealing an "adoption paradox" where technical sophistication often undermines educator trust. The second article reports empirical findings from evaluating four state-of-the-art generative AI models against human evaluators using 4,819 scoring instances from 1,290 student submissions across 31 instructors. Results demonstrate systematic differences between human and AI scoring patterns, with AI models providing more conservative scores than human evaluators who exhibit significant leniency bias. A teacher-level calibration framework successfully addressed these systematic differences, improving human-AI agreement from 53% to over 82%. Qualitative analysis reveals that faculty responses follow predictable technology adoption cycles, with initial enthusiasm giving way to disillusionment due to technical limitations before eventual acceptance based on realistic expectations and proper calibration. The dissertation concludes that effective AES implementation requires validity-centered, human-AI collaborative systems that prioritize transparency and pedagogical alignment over computational supremacy. Five critical research priorities are identified: multi-rater model orchestration, chain-of-thought validation, adaptive human-AI collaboration protocols, scalable implementation architectures, and multimodal document assessment capabilities. This research contributes to developing trustworthy assessment systems that augment rather than replace human expertise in writing instruction.

Degree

PhD

College and Department

David O. McKay School of Education; Educational Inquiry, Measurement, and Evaluation

Rights

https://lib.byu.edu/about/copyright/