•  
  •  
 

Journal of Undergraduate Research

Keywords

cluster analysis, random partition distributions, data analysis technique

College

Physical and Mathematical Sciences

Department

Statistics

Abstract

Cluster analysis is an important exploratory data analysis technique used in a wide variety of fields. Cluster analysis seeks to discover a natural grouping of the data, where items in the same cluster or group are more similar than items from different clusters. Through our research, we developed a novel method for cluster analysis which takes pairwise distance information as input. Our new method improves upon traditional cluster analysis methods which also take pairwise distance information as input, such as hierarchical clustering. Our method, cluster analysis via random partition distributions (CaviarPD) is based on probability distributions and therefore allows the user to discuss probabilities that items are clustered together. Hierarchical clustering, a heuristic method, does not allow for any probability statements about how items are clustered. CaviarPD also gives a better visual of the estimated clustering. Furthermore, through case studies where the true groupings of the data are known, CaviarPD performs better at estimating the true grouping of the data as compared to hierarchical clustering. We now explain hierarchical clustering and how our new method CaviarPD compares to this traditional method.

Share

COinS