Spatial clustering of array CGH features in combination with hierarchical multiple testing
Kyung In Kim, Etienne Roquain (PMA), Mark Van De Wiel

TL;DR
This paper introduces a novel model-based clustering method for array CGH data that captures spatial genomic dependencies and integrates hierarchical multiple testing to identify associations with clinical variables.
Contribution
It presents a new clustering approach that distinguishes data-collapsing from clustering, and combines this with hierarchical testing to control error rates in genomic studies.
Findings
Clustering captures spatial genomic dependency effectively.
Method controls Family-Wise Error Rate in association testing.
Illustrated on two cancer datasets with successful results.
Abstract
We propose a new approach for clustering DNA features using array CGH data from multiple tumor samples. We distinguish data-collapsing: joining contiguous DNA clones or probes with extremely similar data into regions, from clustering: joining contiguous, correlated regions based on a maximum likelihood principle. The model-based clustering algorithm accounts for the apparent spatial patterns in the data. We evaluate the randomness of the clustering result by a cluster stability score in combination with cross-validation. Moreover, we argue that the clustering really captures spatial genomic dependency by showing that coincidental clustering of independent regions is very unlikely. Using the region and cluster information, we combine testing of these for association with a clinical variable in an hierarchical multiple testing approach. This allows for interpreting the significance of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
