For decades, vocabulary size tests have been built upon the idea that if a test-taker knows enough words at a given level of frequency based on a list from corpus, they will also know other words of that approximate frequency as well as all words that are more frequent. However, many vocabulary size tests are based on corpora that are as out-of-date as 70 years old and that may be ill-suited for these tests. Based on these potentially problematic areas, the following research questions were asked. First, to what degree would a vocabulary size test based on a large, contemporary corpus be reliable and valid? Second, would it be more reliable and valid than previously designed vocabulary size tests? Third, do words across, 1,000-word frequency bands vary in their item difficulty? In order to answer these research questions, 403 ESL learners took the Vocabulary of American English Size Test (VAST). This test was based on a words list generated from the Corpus of Contemporary American English (COCA). This thesis shows that COCA word list might be better suited for measuring vocabulary size than lists used in previous vocabulary size assessments. As a 450-million-word corpus, it far surpasses any corpus used in previously designed vocabulary size tests in terms of size, balance, and representativeness. The vocabulary size test built from the COCA list was both highly valid and highly reliable according to a Rasch-based analysis. Rasch person reliability and separation was calculated to be 0.96 and 4.62, respectively. However, the most significant finding of this thesis is that frequency ranking in a word list is actually not as good of a predictor of item difficulty in a vocabulary size assessment as perhaps researchers had previously assumed. A Pearson correlation between frequency ranking in the COCA list and item difficulty for 501 items taken from the first 5,000 most frequent words was 0.474 (r^2 = 0.225) meaning that frequency rank only accounted for 22.5% of the variability of item difficulty. The correlation decreased greatly when item difficulty was correlated against bands of 1,000 words to a weak r = 0.306, (r^2 = 0.094) meaning that 1,000-word bands of frequency only accounts for 9.4% of the variance. Because frequency is a not a highly accurate predictor of item difficulty, it is important to reconsider how vocabulary size tests are designed.



College and Department

Humanities; Linguistics and English Language



Date Submitted


Document Type





vocabulary size, vocabulary assessment, vocabulary breadth, vocabulary level, language testing, test design



Included in

Linguistics Commons