Self-omics: A Self-supervised Learning Framework for Multi-omics Cancer Data
Sayed Hashim, Karthik Nandakumar, Mohammad Yaqub

TL;DR
This paper introduces a self-supervised learning framework for multi-omics cancer data that enhances cancer classification and feature extraction, especially in limited labelled data scenarios, by exploiting inter-omics relationships.
Contribution
The authors develop a novel SSL pre-training paradigm that leverages inter-omics relationships and handles missing data, improving cancer classification and feature extraction without extensive annotations.
Findings
Outperforms state-of-the-art in cancer type classification on TCGA dataset.
Pre-trained encoders serve as effective feature extractors without fine-tuning.
Method is robust to missing data and can be extended to zero-shot cancer classification.
Abstract
We have gained access to vast amounts of multi-omics data thanks to Next Generation Sequencing. However, it is challenging to analyse this data due to its high dimensionality and much of it not being annotated. Lack of annotated data is a significant problem in machine learning, and Self-Supervised Learning (SSL) methods are typically used to deal with limited labelled data. However, there is a lack of studies that use SSL methods to exploit inter-omics relationships on unlabelled multi-omics data. In this work, we develop a novel and efficient pre-training paradigm that consists of various SSL components, including but not limited to contrastive alignment, data recovery from corrupted samples, and using one type of omics data to recover other omic types. Our pre-training paradigm improves performance on downstream tasks with limited labelled data. We show that our approach outperforms…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGene expression and cancer classification · Bioinformatics and Genomic Networks · Cancer Genomics and Diagnostics
