scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data
Olga Ovcharenko, Florian Barkmann, Philip Toma, Imant Daunhawer, Julia Vogt, Sebastian Schelter, Valentina Boeva

TL;DR
scSSL-Bench is a comprehensive benchmark evaluating nineteen self-supervised learning methods across multiple single-cell data tasks, revealing task-specific strengths and the effectiveness of random masking augmentation.
Contribution
This work introduces scSSL-Bench, the first standardized platform for benchmarking SSL methods in single-cell data analysis, providing insights and recommendations for future research.
Findings
Specialized frameworks excel at batch correction.
Generic SSL methods perform well in cell typing.
Random masking outperforms domain-specific augmentations.
Abstract
Self-supervised learning (SSL) has proven to be a powerful approach for extracting biologically meaningful representations from single-cell data. To advance our understanding of SSL methods applied to single-cell data, we present scSSL-Bench, a comprehensive benchmark that evaluates nineteen SSL methods. Our evaluation spans nine datasets and focuses on three common downstream tasks: batch correction, cell type annotation, and missing modality prediction. Furthermore, we systematically assess various data augmentation strategies. Our analysis reveals task-specific trade-offs: the specialized single-cell frameworks, scVI, CLAIRE, and the finetuned scGPT excel at uni-modal batch correction, while generic SSL methods, such as VICReg and SimCLR, demonstrate superior performance in cell typing and multi-modal data integration. Random masking emerges as the most effective augmentation…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Domain Adaptation and Few-Shot Learning · Cell Image Analysis Techniques
Methods*Communicated@Fast*How Do I Communicate to Expedia? · Dense Connections · Random Gaussian Blur · Normalized Temperature-scaled Cross Entropy Loss · Feedforward Network · SimCLR
