Abstract

We present INCREMENT, a cluster refinement algorithm which utilizes user feedback to refine clusterings. INCREMENT is capable of improving clusterings produced by arbitrary clustering algorithms. The initial clustering provided is first sub-clustered to improve query efficiency. A small set of select instances from each of these sub-clusters are presented to a user for labelling. Utilizing the user feedback, INCREMENT trains a feature embedder to map the input features to a new feature space. This space is learned such that spatial distance is inversely correlated with semantic similarity, determined from the user feedback. A final clustering is then formed in the embedded space. INCREMENT is tested on 9 datasets initially clustered with 4 distinct clustering algorithms. INCREMENT improved the accuracy of 71% of the initial clusterings with respect to a target clustering. For all the experiments the median percent improvement is 27.3% for V-Measure and is 6.08% for accuracy.

Degree

College and Department

Physical and Mathematical Sciences; Computer Science

Rights

http://lib.byu.edu/about/copyright/