Keywords
information search, phrase matching, clustering, fuzzy-set IR model
Abstract
As the number of RSS news feeds continue to increase over the Internet, it becomes necessary to minimize the workload of the user who is otherwise required to scan through huge numbers of news articles to find related articles of interest, which is a tedious and often an impossible task. In order to solve this problem, we present a novel approach, called InFRSS, which consists of a correlation-based phrase matching (CPM) model and a fuzzy compatibility clustering (FCC) model. CPM can detect RSS news articles containing phrases that are the same as well as semantically alike, and dictate the degrees of similarity of any two articles. FCC identifies and clusters non-redundant, closely related RSS news articles based on their degrees of similarity and a fuzzy compatibility relation. Experimental results show that (i) our CPM model on matching bigrams and trigrams in RSS news articles outperforms other phrase/keyword-matching approaches and (ii) our FCC model generates high quality clusters and outperforms other well-known clustering techniques.
Original Publication Citation
Maria Soledad Pera and Yiu-Kai Ng. "Utilizing Phrase-Similarity Measures for Detecting and Clustering Informative RSS News Articles." Journal of Integrated Computer-Aided Engineering (ICAE), Volume 15, Number 4, pp. 331-35, 28, IOS Press. (This is an extended version of the KSEM 27 paper, Finding Similar RSS News Articles Using Correlation-Based Phrase Matching, which was selected and invited to be published by ICAE.)
BYU ScholarsArchive Citation
Ng, Yiu-Kai D. and Pera, Maria Soledad, "Utilizing Phrase-Similarity Measures for Detecting and Clustering Informative RSS News Articles" (2008). Faculty Publications. 948.
https://scholarsarchive.byu.edu/facpub/948
Document Type
Peer-Reviewed Article
Publication Date
2008-01-01
Permanent URL
http://hdl.lib.byu.edu/1877/2635
Publisher
IOS Press
Language
English
College
Physical and Mathematical Sciences
Department
Computer Science
Copyright Status
© 2007 IOS Press.
Copyright Use Information
http://lib.byu.edu/about/copyright/