# Learning interpretable representations of single-cell multi-omics data with multi-output Gaussian processes

**Authors:** Zahra Moslehi, Sareh AmeriFar, Kevin de Azevedo, Florian Buettner

PMC · DOI: 10.1093/nar/gkaf630 · 2025-07-22

## TL;DR

This paper introduces a new method for analyzing single-cell data that balances powerful representation learning with interpretability.

## Contribution

A novel framework combining expressive embeddings with interpretable Gaussian processes for multi-omics data.

## Key findings

- The model learns distinct representations for cells and genes from multi-modal data.
- Interpretable latent dimensions effectively capture the data's underlying structure.
- Gene relevance maps connect cell clusters with their marker genes in latent space.

## Abstract

Learning representations of single-cell genomics data is challenging due to the nonlinear and often multi-modal nature of the data on one hand and the need for interpretable representations on the other hand. Existing approaches tend to focus either on interpretability aspects via linear matrix factorization or on maximizing expressive power via neural network-based embeddings using black-box variational autoencoders or graph embedding approaches. We address this trade-off between expressive power and interpretability by introducing a novel approach that combines highly expressive representation learning via an embedding layer with interpretable multi-output Gaussian processes within a unified framework. In our model, we learn distinct representations for samples (cells) and features (genes) from multi-modal single-cell data. We demonstrate that even a few interpretable latent dimensions can effectively capture the underlying structure of the data. Our model yields interpretable relationships between groups of cells and their associated marker genes: leveraging a gene relevance map, we establish connections between cell clusters (e.g. specific cell types) and feature clusters (e.g. marker genes for those specific cell types) within the learned latent spaces of cells and features.

Graphical Abstract

## Full-text entities

- **Genes:** XCL1 (X-C motif chemokine ligand 1) [NCBI Gene 6375] {aka ATAC, LPTN, LTN, SCM-1, SCM-1a, SCM1}, IL7R (interleukin 7 receptor) [NCBI Gene 3575] {aka CD127, CDW127, IL-7R-alpha, IL-7Ralpha, IL7RA, IL7Ralpha}, CD19 (CD19 molecule) [NCBI Gene 930] {aka B4, CVID3}, CD27 (CD27 molecule) [NCBI Gene 939] {aka S152, S152. LPFS2, T14, TNFRSF7, Tp55}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}, CD28 (CD28 molecule) [NCBI Gene 940] {aka IMD123, Tp44}, CARMN (cardiac mesoderm enhancer-associated non-coding RNA) [NCBI Gene 728264] {aka CARMEN, MIR143HG}, FCGR3A (Fc gamma receptor IIIa) [NCBI Gene 2214] {aka CD16-II, CD16A, FCG3, FCGR3, FCRIIIA, FcGRIIIA}, CD14 (CD14 molecule) [NCBI Gene 929], CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, TIGIT (T cell immunoreceptor with Ig and ITIM domains) [NCBI Gene 201633] {aka VSIG9, VSTM3, WUCAM}, CD86 (CD86 molecule) [NCBI Gene 942] {aka B7-2, B7.2, B70, BU63, CD28LG2, CD86 v6}, CD34 (CD34 molecule) [NCBI Gene 947], EMX2OS (EMX2 opposite strand/antisense RNA) [NCBI Gene 196047] {aka EMX2-AS1, NCRNA00045}, NCAM1 (neural cell adhesion molecule 1) [NCBI Gene 4684] {aka CD56, MSK39, NCAM}, KRT20 (keratin 20) [NCBI Gene 54474] {aka CD20, CK-20, CK20, K20, KRT21}
- **Diseases:** LVM (MESH:C536141), melanoma (MESH:D008545)
- **Chemicals:** lipid (MESH:D008055), Bu (MESH:D002066), 5k-CITE (-), Au (MESH:D006046)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Figures

11 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12282953/full.md

---
Source: https://tomesphere.com/paper/PMC12282953