# CellPredX, a computational framework for cross-data type, cross-sample, and cross-protocol cell type annotation through domain adaptation and deep metric learning

**Authors:** Yan Liu, Yu Xia, He Yan, Long-Chen Shen, Yiheng Zhu, Ji-Peng Qiang, Guo Wei

PMC · DOI: 10.1371/journal.pcbi.1013824 · PLOS Computational Biology · 2026-01-02

## TL;DR

CellPredX is a new framework that improves cell type annotation across different single-cell data types and protocols using domain adaptation and deep metric learning.

## Contribution

CellPredX introduces a unified, semi-supervised framework for cross-modality cell type annotation with interpretable predictions.

## Key findings

- CellPredX outperforms existing methods in accuracy and robustness across 22 benchmark datasets.
- The framework provides biologically interpretable results by identifying key genes and features.
- It effectively handles scRNA-seq, scATAC-seq, and cross-protocol data annotation.

## Abstract

Accurate cell type annotation is fundamental to single-cell analysis, yet remains challenging across heterogeneous datasets and modalities. In particular, transferring labels between scRNA-seq and scATAC-seq data poses unique difficulties due to discrepancies in sequencing protocols and feature spaces. Existing methods typically handle only a subset of these challenges, often requiring scenario-specific adjustments and offering limited interpretability. Here, we present CellPredX, a structurally unified but adaptively parameterized, semi-supervised cross-modality framework for label transfer across scRNA-seq, scATAC-seq, and cross-protocol datasets. While maintaining a unified model architecture and optimization strategy, CellPredX allows adaptive tuning of loss-weight hyperparameters to account for the varying degree of similarity or discrepancy between different reference–query dataset pairs. CellPredX integrates domain adaptation and deep metric learning to align heterogeneous embeddings, and introduces a sparse center loss with an attention mechanism to enhance discriminative representations while suppressing noise. Moreover, an integrated interpreter module based on gradient attribution enables biological interpretability by identifying key markers and feature dimensions driving model predictions. Through extensive benchmarking across scRNA to scATAC, scATAC to scATAC, and scRNA to scRNA transfers, CellPredX consistently outperforms state-of-the-art annotation methods in both accuracy and robustness. The interpreter module further reveals biologically meaningful marker patterns that are consistent with known cell hierarchies. Together, these results demonstrate that CellPredX provides an interpretable and scalable solution for cross-modality cell type annotation in single-cell multi-omic integration.

Accurate cell type annotation is crucial for single-cell analysis, yet remains difficult when data come from different modalities or sequencing protocols, such as scRNA-seq and scATAC-seq. Differences in feature space, noise levels, and batch effects often hinder effective label transfer, and many existing methods work only in specific scenarios or lack interpretability. We introduce CellPredX, a unified semi-supervised framework for cell type annotation across scRNA-seq, scATAC-seq, and cross-protocol datasets. CellPredX integrates domain adaptation to align heterogeneous data distributions and deep metric learning to learn discriminative embeddings. A sparse center loss reduces noise and enhances representation quality, while an Integrated Gradients–based interpreter identifies key genes contributing to predictions, improving biological transparency.Across 22 benchmark datasets, CellPredX consistently outperforms state-of-the-art methods in accuracy and robustness. These results show that CellPredX provides an effective, scalable, and interpretable solution for cross-modality cell type annotation in single-cell multi-omics.

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12758788/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12758788/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12758788/full.md

---
Source: https://tomesphere.com/paper/PMC12758788