Parameter-free representations outperform single-cell foundation models on downstream benchmarks
Huan Souza, Pankaj Mehta

TL;DR
This paper demonstrates that simple, parameter-free linear methods can outperform complex transformer-based models in single-cell RNA sequencing tasks, emphasizing the importance of rigorous benchmarking.
Contribution
It shows that straightforward, interpretable pipelines can achieve state-of-the-art results, challenging the necessity of deep learning models for certain single-cell analysis tasks.
Findings
Linear methods match or surpass deep learning models on benchmarks
Simple pipelines excel in out-of-distribution tasks
Biological cell identity can be captured by linear representations
Abstract
Single-cell RNA sequencing (scRNA-seq) data exhibit strong and reproducible statistical structure. This has motivated the development of large-scale foundation models, such as TranscriptFormer, that use transformer-based architectures to learn a generative model for gene expression by embedding genes into a latent vector space. These embeddings have been used to obtain state-of-the-art (SOTA) performance on downstream tasks such as cell-type classification, disease-state prediction, and cross-species learning. Here, we ask whether similar performance can be achieved without utilizing computationally intensive deep learning-based representations. Using simple, interpretable pipelines that rely on careful normalization and linear methods, we obtain SOTA or near SOTA performance across multiple benchmarks commonly used to evaluate single-cell foundation models, including outperforming…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSingle-cell and spatial transcriptomics · Generative Adversarial Networks and Image Synthesis · Cell Image Analysis Techniques
