Abstract

Current web search engines, such as Google, Yahoo!, and Bing, rank the set of documents S retrieved in response to a user query Q and display each document with a title and a snippet, which serves as an abstract of the corresponding document in S. Snippets, however, are not as useful as they are designed for, i.e., to assist search engine users to quickly identify results of interest, if they exist, without browsing through the documents in S, since they (i) often include very similar information and (ii) do not capture the main content of the corresponding documents. Moreover, when the intended information need specified in a search query is ambiguous, it is difficult, if not impossible, for a search engine to identify precisely the set of documents that satisfy the user's intended request. Furthermore, a document title retrieved by web search engines is not always a good indicator of the content of the corresponding document, since it is not always informative. All these design problems can be solved by our proposed query-based, web informative summarization engine, denoted Q-WISE. Q-WISE clusters documents in S, which allows users to view segregated document collections created according to the specific topic covered in each collection, and generates a concise/comprehensive summary for each collection/cluster of documents. Q-WISE is also equipped with a query suggestion module that provides a guide to its users in formulating a keyword query, which facilitates the web search and improves the precision and recall of the search results. Experimental results show that Q-WISE is highly effective and efficient in generating a high quality summary for each cluster of documents on a specific topic, retrieved in response to a Q-WISE user's query. The empirical study also shows that Q-WISE's clustering algorithm is highly accurate, labels generated for the clusters are useful and often reflect the topic of the corresponding clustered documents, and the performance of the query suggestion module of Q-WISE is comparable to commercial web search engines.

Degree

MS

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/

Date Submitted

2011-03-15

Document Type

Thesis

Handle

http://hdl.lib.byu.edu/1877/etd4306

Keywords

clustering, summarization, query suggestion

Language

English

Share

COinS