# Representation learning of single-cell RNA-seq data

**Authors:** Constantin Ahlmann-Eltze, Florian Barkmann, Jan Lause, Valentina Boeva, Dmitry Kobak

PMC · DOI: 10.1261/rna.080889.125 · RNA · 2026-04-01

## TL;DR

This paper reviews methods for learning compact representations of single-cell RNA sequencing data to improve analysis and integration of biological information.

## Contribution

The paper provides a unified taxonomy of representation learning approaches for scRNA-seq, highlighting their foundations and differences.

## Key findings

- Representation learning methods help reduce noise and capture essential variation in single-cell RNA-seq data.
- Different approaches like autoencoders and transformers are unified under a common framework for scRNA-seq analysis.
- Challenges remain in benchmarking and integrating data from diverse experiments.

## Abstract

Single-cell RNA sequencing (scRNA-seq) has become a cornerstone experimental technique in tissue biology, with gene expression data for over 100 million cells available in public repositories. The high dimensionality, sparsity, and technical noise inherent to scRNA-seq data have motivated the development of a broad spectrum of representation learning approaches. These methods learn compressed, lower-dimensional representations of single-cell transcriptomes that are meant to preserve essential variation while reducing noise, and can be used for clustering, visualization, trajectory inference, and other downstream tasks. Furthermore, methods have emerged that aim to integrate data from multiple experiments by learning a common latent representation. In this review, we frame factor models, autoencoders, contrastive learning approaches, and transformer-based foundation models as distinct instances of the representation learning paradigm for scRNA-seq. We provide a coherent taxonomy of these methods that articulates their conceptual foundations, shared assumptions, and key distinctions. We also discuss benchmarking and identify major challenges and open questions that will shape the future of the field.

## Full-text entities

- **Genes:** CRLS1 (cardiolipin synthase 1) [NCBI Gene 54675] {aka C20orf155, CLS, CLS1, COSPD57, GCD10, dJ967N21.6}
- **Diseases:** OPEN ISSUES (OMIM:606689), PCA (MESH:C566443), GLM (MESH:D005910)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12990802/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12990802/full.md

## References

115 references — full list in the complete paper: https://tomesphere.com/paper/PMC12990802/full.md

---
Source: https://tomesphere.com/paper/PMC12990802