A Novel cVAE-Augmented Deep Learning Framework for Pan-Cancer RNA-Seq Classification
Vinil Polepalli

TL;DR
This paper introduces a deep learning framework that uses a class-conditional variational autoencoder to generate synthetic gene expression data, significantly improving pan-cancer classification accuracy from RNA-Seq data.
Contribution
The study presents a novel cVAE-based data augmentation method that enhances deep learning models for pan-cancer RNA-Seq classification, addressing high dimensionality and class imbalance.
Findings
Achieved approximately 98% classification accuracy on test data.
Synthetic data augmentation improved performance, especially for underrepresented classes.
Demonstrated the effectiveness of cVAE in generating realistic gene expression samples.
Abstract
Pan-cancer classification using transcriptomic (RNA-Seq) data can inform tumor subtyping and therapy selection, but is challenging due to extremely high dimensionality and limited sample sizes. In this study, we propose a novel deep learning framework that uses a class-conditional variational autoencoder (cVAE) to augment training data for pan-cancer gene expression classification. Using 801 tumor RNA-Seq samples spanning 5 cancer types from The Cancer Genome Atlas (TCGA), we first perform feature selection to reduce 20,531 gene expression features to the 500 most variably expressed genes. A cVAE is then trained on this data to learn a latent representation of gene expression conditioned on cancer type, enabling the generation of synthetic gene expression samples for each tumor class. We augment the training set with these cVAE-generated samples (doubling the dataset size) to mitigate…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
