Abstract
Research for automatic readability prediction of text has increased in the last decade and has shown that various machine learning methods can effectively address this problem. Many researchers have applied machine learning to readability prediction for English, while Modern Standard Arabic (MSA) has received little attention. Here I describe a system which leverages machine learning to automatically predict the readability of MSA. I gathered a corpus comprising 179 documents that were annotated with the Interagency Language Roundtable (ILR) levels. Then, I extracted lexical and discourse features from each document. Finally, I applied the Tilburg Memory-Based Learning (TiMBL) machine learning system to read these features and predict the ILR level of each document using 10-fold cross validation for both 3-level and 5-level classification tasks and an 80/20 division for a 5-level classification task. I measured performance using the F-score. For 3-level and 5-level classifications my system achieved F-scores of 0.719 and 0.519 respectively. I discuss the implication of these results and the possibility of future development.
Degree
MA
College and Department
Humanities; Linguistics and English Language
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Forsyth, Jonathan Neil, "Automatic Readability Detection for Modern Standard Arabic" (2014). Theses and Dissertations. 3983.
https://scholarsarchive.byu.edu/etd/3983
Date Submitted
2014-03-19
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd6863
Keywords
readability, Modern Standard Arabic, machine learning
Language
English