MS-ConTab: Multi-Scale Contrastive Learning of Mutation Signatures for Pan Cancer Representation and Stratification
Yifan Dou, Adam Khadre, Ruben C Petreaca, Golrokh Mirzaei

TL;DR
This paper introduces a novel unsupervised contrastive learning framework that clusters cancer types based on mutation signatures, providing biologically meaningful groupings aligned with known mutational processes and tissue origins.
Contribution
It is the first application of contrastive learning for cohort-level cancer clustering using mutation data, combining gene-level and chromosome-level signatures for improved representation.
Findings
Clusters align with known mutational processes
Framework is scalable and interpretable
Effective in grouping 43 cancer types
Abstract
Motivation. Understanding the pan-cancer mutational landscape offers critical insights into the molecular mechanisms underlying tumorigenesis. While patient-level machine learning techniques have been widely employed to identify tumor subtypes, cohort-level clustering, where entire cancer types are grouped based on shared molecular features, has largely relied on classical statistical methods. Results. In this study, we introduce a novel unsupervised contrastive learning framework to cluster 43 cancer types based on coding mutation data derived from the COSMIC database. For each cancer type, we construct two complementary mutation signatures: a gene-level profile capturing nucleotide substitution patterns across the most frequently mutated genes, and a chromosome-level profile representing normalized substitution frequencies across chromosomes. These dual views are encoded using…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
