This study investigated the significance of semantics in computer-generated word frequency counts in response to a call for new word lists (Read, 2000; Gardner, 2007). Read claims that no corpus projects to date have produced any "definitive, stand-alone word-frequency lists" (p. 226). Many researchers are wary of the fact that the concept of a word is never clearly defined in most studies that have dealt with word frequency counts. It is clear from the research that one universally acceptable construct for the concept of word does not exist. In fact, many past word frequency counts only examine word forms without considering the word meanings and the possible effects of homography on lists. Ming-Tzu and Nation (2004) did some research on the Academic Word List (AWL) that addresses some criticisms of word-frequency lists. They evaluate the extent of homography throughout the AWL. However, words found in the AWL are often not a part of the highest frequency word-forms in English. The present study focuses on high frequency words. It evaluates a randomized sample of 46 lemmas that occur at least 1500 times in the British National Corpus (BNC). A further random sampling of 200 examples for each lemma, in context, was semantically analyzed and tallied. One hundred of these examples were from the written portion and the other 100 from the spoken portion. The list of meanings for each word was compiled using conflated WordNet senses and some additional senses. Each context was double and sometimes triple rated. The results indicate that the impact of semantic frequency versus form-based frequency is considerable. The study suggests that the presence of homography tends to be extensive in many high-frequency word forms, across major registers of the language, and within each of the four major parts of speech. It further suggests that basing frequency on semantics will considerably alter the content of a high-frequency word list.



homography, word lists, high frequency, vocabulary lists, ESL, text coverage, word coverage, written vs. spoken

