# CeLLTra: aligning cell names with gene expression via a pathway-informed transformer

**Authors:** Zhao Li, Zaiyi Zheng, Rongbin Li, Wenbo Chen, Yuntao Yang, Meer A Ali, Jundong Li, W Jim Zheng

PMC · DOI: 10.1093/bioinformatics/btaf655 · Bioinformatics · 2025-12-05

## TL;DR

CeLLTra is a new method that uses pathway-informed machine learning to improve cell-type annotation from single-cell RNA sequencing data.

## Contribution

CeLLTra introduces a contrastive learning framework with a pathway-informed Transformer for accurate cell-type prediction.

## Key findings

- CeLLTra outperforms existing methods in supervised and zero-shot cell-type prediction.
- The model generalizes well to external datasets and improves clustering performance.
- It enhances characterization of cancerous cell states in lung cancer patients.

## Abstract

Single-cell RNA sequencing (scRNA-Seq) technology enables detailed exploration of gene expression at the individual cell level, crucial for annotating cell types and understanding cellular diversity. Traditional methods for cell type annotation often rely on marker genes and manual labeling, posing challenges due to low data quality and incomplete reference datasets.

We developed CeLLTra, a novel contrastive learning framework that leverages a Transformer-based model integrating biological pathway information to group genes into super tokens, effectively capturing comprehensive gene expression from scRNA-Seq data. By combining this pathway-informed Transformer with a pretrained domain-specific language model, CeLLTra accurately aligns cell-type annotations with gene expression profiles. Evaluations on a large-scale human scRNA-Seq dataset showed that CeLLTra significantly outperformed state-of-the-art methods in supervised and zero-shot cell-type prediction. Additionally, CeLLTra generalized well to external datasets, improving clustering performance and enabling better characterization of cancerous cell states in tumor-infiltrating myeloid cells from non-small cell lung cancer patients.

CeLLTra is freely available on GitHub (https://github.com/WJZheng-group/CeLLTra) and Zenodo (https://doi.org/10.5281/zenodo.17666735). The datasets underlying this article are the following: GSE201333 and GSE127465. All these datasets are publicly available and can be freely accessed on the Gene Expression Omnibus repository.

## Linked entities

- **Diseases:** non-small cell lung cancer (MONDO:0005233)

## Full-text entities

- **Diseases:** lung cancer (MESH:D008175), Cancer (MESH:D009369), non-small cell lung cancer (MESH:D002289)
- **Species:** Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** LM22 — Homo sapiens (Human), Astrocytoma, Cancer cell line (CVCL_A1IU)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12881829/full.md

## Figures

5 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12881829/full.md

## References

42 references — full list in the complete paper: https://tomesphere.com/paper/PMC12881829/full.md

---
Source: https://tomesphere.com/paper/PMC12881829