Abstract

The World-Wide Web contains a lot of information and reading through the web pages to collect this information is tedious, time consuming and error prone. Users need an automated solution for extracting or highlighting the data that they are interested in. Building a regular expression to match the text they are interested in will automate the process, but regular expressions are hard to create and certainly are not feasible for non-programmers to construct. Text Identification by Example (TIBE) makes it easier for end-users to harvest information from the web and other text documents. With TIBE, training text classifiers from user-selected positive and negative examples replaces the hand-writing of regular expressions. The text classifiers can then be used to extract or highlight text on web pages.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2007-08-02

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd2060

Keywords

pattern matching, text identification, examples, concept modeling, classifier, training

Share

COinS