A Hybrid Mixture Approach for Clustering and Characterizing Cancer Data
Kazeem Kareem, Fan Dai

TL;DR
This paper introduces a hybrid matrix-free method for efficient clustering and characterization of high-dimensional cancer data, improving convergence speed and accuracy over existing techniques.
Contribution
It presents a novel hybrid computational scheme combining Gaussian mixtures with generalized factor analyzers, enabling scalable analysis of large biomedical datasets.
Findings
Faster convergence than existing methods
High accuracy in breast cancer subtype identification
Effective characterization of lymphoma subtypes
Abstract
Model-based clustering is widely used for identifying and distinguishing types of diseases. However, modern biomedical data coming with high dimensions make it challenging to perform the model estimation in traditional cluster analysis. The incorporation of factor analyzer into the mixture model provides a way to characterize the large set of data features, but the current estimation method is computationally impractical for massive data due to the intrinsic slow convergence of the embedded algorithms, and the incapability to vary the size of the factor analyzers, preventing the implementation of a generalized mixture of factor analyzers and further characterization of the data clusters. We propose a hybrid matrix-free computational scheme to efficiently estimate the clusters and model parameters based on a Gaussian mixture along with generalized factor analyzers to summarize the large…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
