Cancer Subtyping via Embedded Unsupervised Learning on Transcriptomics Data
Ziwei Yang, Lingwei Zhu, Zheng Chen, Ming Huang, Naoaki Ono, MD, Altaf-Ul-Amin, Shigehiko Kanaya

TL;DR
This paper introduces an unsupervised deep learning approach for cancer subtyping that models data distribution directly, reducing overfitting and capturing molecular features more effectively, which improves classification accuracy.
Contribution
It proposes a novel vector quantization-based method that bypasses Gaussian assumptions, enhancing unsupervised cancer subtyping from transcriptomics data.
Findings
Better capture of latent space features
Reduced overfitting in small sample sizes
Improved subtyping accuracy
Abstract
Cancer is one of the deadliest diseases worldwide. Accurate diagnosis and classification of cancer subtypes are indispensable for effective clinical treatment. Promising results on automatic cancer subtyping systems have been published recently with the emergence of various deep learning methods. However, such automatic systems often overfit the data due to the high dimensionality and scarcity. In this paper, we propose to investigate automatic subtyping from an unsupervised learning perspective by directly constructing the underlying data distribution itself, hence sufficient data can be generated to alleviate the issue of overfitting. Specifically, we bypass the strong Gaussianity assumption that typically exists but fails in the unsupervised learning subtyping literature due to small-sized samples by vector quantization. Our proposed method better captures the latent space features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Machine Learning and Data Classification · AI in cancer detection
