Automated Cancer Subtyping via Vector Quantization Mutual Information Maximization
Zheng Chen, Lingwei Zhu, Ziwei Yang, Takashi Matsubara

TL;DR
This paper introduces an unsupervised clustering method that uses vector quantization and mutual information maximization to identify cancer subtypes from high-dimensional genetic expression data, refining labels and correlating with survival rates.
Contribution
It presents a novel, label-agnostic clustering approach that adaptively determines the number of cancer subtypes using mutual information maximization on genetic profiles.
Findings
Refines existing cancer subtype labels
High correlation with cancer survival rates
Automatically determines the number of subtypes
Abstract
Cancer subtyping is crucial for understanding the nature of tumors and providing suitable therapy. However, existing labelling methods are medically controversial, and have driven the process of subtyping away from teaching signals. Moreover, cancer genetic expression profiles are high-dimensional, scarce, and have complicated dependence, thereby posing a serious challenge to existing subtyping models for outputting sensible clustering. In this study, we propose a novel clustering method for exploiting genetic expression profiles and distinguishing subtypes in an unsupervised manner. The proposed method adaptively learns categorical correspondence from latent representations of expression profiles to the subtypes output by the model. By maximizing the problem -- agnostic mutual information between input expression profiles and output subtypes, our method can automatically decide a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning in Bioinformatics · Machine Learning and Data Classification
