Journal of Undergraduate Research
Keywords
memorized patterns, word order, unsupervised language learning, natural language
College
Physical and Mathematical Sciences
Department
Computer Science
Abstract
Despite the ever-increasing abilities of computers, natural language analysis is still a challenge. The intricacies of natural language are far too many to enumerate, giving rise to automated algorithms which learn how the language is used from large text corpora. Many current methods use complex statistical approaches involving multi-dimensional vectors and factor analysis, and admittedly fail to take into account valuable contextual information such as word order. This project continued to develop a novel, simple, unsupervised learning approach for determining word similarity based on context in reoccurring word sequences; previous work confirms that similar words are used in similar contexts. The algorithm reads raw, untagged text, recording repeated word-use patterns and their frequency. These patterns preserve the context of words by remembering where each word appears in relation to those surrounding it. By examining the context of a target word, other words or phrases with similar meaning can be found within similar contexts.
Recommended Citation
Hulet, Steve and Warnick, Dr. Sean
(2013)
"Synonymity of Memorized Patterns: Considering Patterns and Word Order in Unsupervised Language Learning,"
Journal of Undergraduate Research: Vol. 2013:
Iss.
1, Article 2644.
Available at:
https://scholarsarchive.byu.edu/jur/vol2013/iss1/2644