Journal of Undergraduate Research
Keywords
cluster analysis, random partition distributions, data analysis technique
College
Physical and Mathematical Sciences
Department
Statistics
Abstract
Cluster analysis is an important exploratory data analysis technique used in a wide variety of fields. Cluster analysis seeks to discover a natural grouping of the data, where items in the same cluster or group are more similar than items from different clusters. Through our research, we developed a novel method for cluster analysis which takes pairwise distance information as input. Our new method improves upon traditional cluster analysis methods which also take pairwise distance information as input, such as hierarchical clustering. Our method, cluster analysis via random partition distributions (CaviarPD) is based on probability distributions and therefore allows the user to discuss probabilities that items are clustered together. Hierarchical clustering, a heuristic method, does not allow for any probability statements about how items are clustered. CaviarPD also gives a better visual of the estimated clustering. Furthermore, through case studies where the true groupings of the data are known, CaviarPD performs better at estimating the true grouping of the data as compared to hierarchical clustering. We now explain hierarchical clustering and how our new method CaviarPD compares to this traditional method.
Recommended Citation
Carter, Brandon and Dahl, Dr. David B.
(2019)
"Cluster Analysis via Random Partition Distributions,"
Journal of Undergraduate Research: Vol. 2019:
Iss.
2019, Article 161.
Available at:
https://scholarsarchive.byu.edu/jur/vol2019/iss2019/161