The World-Wide Web contains a lot of information and reading through the web pages to collect this information is tedious, time consuming and error prone. Users need an automated solution for extracting or highlighting the data that they are interested in. Building a regular expression to match the text they are interested in will automate the process, but regular expressions are hard to create and certainly are not feasible for non-programmers to construct. Text Identification by Example (TIBE) makes it easier for end-users to harvest information from the web and other text documents. With TIBE, training text classifiers from user-selected positive and negative examples replaces the hand-writing of regular expressions. The text classifiers can then be used to extract or highlight text on web pages.
College and Department
Physical and Mathematical Sciences; Computer Science
BYU ScholarsArchive Citation
Preece, Daniel Joseph, "Text Identification by Example" (2007). All Theses and Dissertations. 1184.
pattern matching, text identification, examples, concept modeling, classifier, training