Abstract
The World-Wide Web contains a lot of information and reading through the web pages to collect this information is tedious, time consuming and error prone. Users need an automated solution for extracting or highlighting the data that they are interested in. Building a regular expression to match the text they are interested in will automate the process, but regular expressions are hard to create and certainly are not feasible for non-programmers to construct. Text Identification by Example (TIBE) makes it easier for end-users to harvest information from the web and other text documents. With TIBE, training text classifiers from user-selected positive and negative examples replaces the hand-writing of regular expressions. The text classifiers can then be used to extract or highlight text on web pages.
Degree
MS
College and Department
Physical and Mathematical Sciences; Computer Science
Rights
http://lib.byu.edu/about/copyright/
BYU ScholarsArchive Citation
Preece, Daniel Joseph, "Text Identification by Example" (2007). Theses and Dissertations. 1184.
https://scholarsarchive.byu.edu/etd/1184
Date Submitted
2007-08-02
Document Type
Thesis
Handle
http://hdl.lib.byu.edu/1877/etd2060
Keywords
pattern matching, text identification, examples, concept modeling, classifier, training
Language
English