A Hybrid Mixture Approach for Clustering and Characterizing Cancer Data

Kazeem Kareem; Fan Dai

arXiv:2507.14380·stat.ME·July 22, 2025

A Hybrid Mixture Approach for Clustering and Characterizing Cancer Data

Kazeem Kareem, Fan Dai

PDF

TL;DR

This paper introduces a hybrid matrix-free method for efficient clustering and characterization of high-dimensional cancer data, improving convergence speed and accuracy over existing techniques.

Contribution

It presents a novel hybrid computational scheme combining Gaussian mixtures with generalized factor analyzers, enabling scalable analysis of large biomedical datasets.

Findings

01

Faster convergence than existing methods

02

High accuracy in breast cancer subtype identification

03

Effective characterization of lymphoma subtypes

Abstract

Model-based clustering is widely used for identifying and distinguishing types of diseases. However, modern biomedical data coming with high dimensions make it challenging to perform the model estimation in traditional cluster analysis. The incorporation of factor analyzer into the mixture model provides a way to characterize the large set of data features, but the current estimation method is computationally impractical for massive data due to the intrinsic slow convergence of the embedded algorithms, and the incapability to vary the size of the factor analyzers, preventing the implementation of a generalized mixture of factor analyzers and further characterization of the data clusters. We propose a hybrid matrix-free computational scheme to efficiently estimate the clusters and model parameters based on a Gaussian mixture along with generalized factor analyzers to summarize the large…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.