Array comparative genomic hybridization (aCGH) is a technique for identifying duplications and deletions of DNA at specific locations across a genome. Potential objectives of aCGH analysis are the identification of (1) altered regions for a given subject, (2) altered regions across a set of individuals, and (3) clinically relevant clusters of hybridizations. aCGH analysis can be particularly useful when it identifies previously unknown clusters with clinical relevance. This project focuses on the assessment of existing aCGH clustering methodologies. Three methodologies are considered: hierarchical clustering, weighted clustering of called aCGH data, and clustering based on probabilistic recurrent regions of alteration within subsets of individuals. Assessment is conducted first through the analysis of aCGH data obtained from patients with ovarian cancer and then through simulations. Performance assessment for the data analysis is based on cluster assignment correlation with clinical outcomes (e.g., survival). For each method, 1,000 simulations are summarized with Cohen's kappa coefficient, interpreted as the proportion of correct cluster assignments beyond random chance. Both the data analysis and the simulation results suggest that hierarchical clustering tends to find more clinically relevant clusters when compared to the other methods. Additionally, these clusters are composed of more patients who belong in the clusters to which they are assigned.



College and Department

Physical and Mathematical Sciences; Statistics



Date Submitted


Document Type

Selected Project




array CGH, hierarchical clustering, WECCA