A Novel cVAE-Augmented Deep Learning Framework for Pan-Cancer RNA-Seq Classification

Vinil Polepalli

arXiv:2508.02743·q-bio.GN·August 6, 2025

A Novel cVAE-Augmented Deep Learning Framework for Pan-Cancer RNA-Seq Classification

Vinil Polepalli

PDF

TL;DR

This paper introduces a deep learning framework that uses a class-conditional variational autoencoder to generate synthetic gene expression data, significantly improving pan-cancer classification accuracy from RNA-Seq data.

Contribution

The study presents a novel cVAE-based data augmentation method that enhances deep learning models for pan-cancer RNA-Seq classification, addressing high dimensionality and class imbalance.

Findings

01

Achieved approximately 98% classification accuracy on test data.

02

Synthetic data augmentation improved performance, especially for underrepresented classes.

03

Demonstrated the effectiveness of cVAE in generating realistic gene expression samples.

Abstract

Pan-cancer classification using transcriptomic (RNA-Seq) data can inform tumor subtyping and therapy selection, but is challenging due to extremely high dimensionality and limited sample sizes. In this study, we propose a novel deep learning framework that uses a class-conditional variational autoencoder (cVAE) to augment training data for pan-cancer gene expression classification. Using 801 tumor RNA-Seq samples spanning 5 cancer types from The Cancer Genome Atlas (TCGA), we first perform feature selection to reduce 20,531 gene expression features to the 500 most variably expressed genes. A cVAE is then trained on this data to learn a latent representation of gene expression conditioned on cancer type, enabling the generation of synthetic gene expression samples for each tumor class. We augment the training set with these cVAE-generated samples (doubling the dataset size) to mitigate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.