# Improving atlas-scale single-cell annotation models with hierarchical cross-entropy loss

**Authors:** Sebastiano Cultrera di Montesano, Davide D’Ascenzo, Srivatsan Raghavan, Ava P. Amini, Peter S. Winter, Lorin Crawford

PMC · DOI: 10.1038/s43588-025-00945-z · Nature Computational Science · 2026-01-30

## TL;DR

This paper introduces a new loss function that improves cell type annotation in single-cell RNA sequencing by leveraging biological hierarchy, boosting performance without extra computational cost.

## Contribution

The novel hierarchical cross-entropy loss integrates biological ontology structure into model training for better generalization.

## Key findings

- The hierarchical cross-entropy loss improves out-of-distribution performance by 12−15% across various model architectures.
- Incorporating biological hierarchy into training objectives enhances annotation accuracy without increasing computational costs.
- The study emphasizes the importance of improving annotated cell type connectivity in new data generation for better generalization.

## Abstract

Accurately annotating cell types is essential for extracting biological insight from single-cell RNA sequencing data. Although cell types are naturally organized into hierarchical ontologies, most computational models do not explicitly incorporate this structure into their training objectives. Here, we introduce a hierarchical cross-entropy loss that aligns model objectives with biological structure. Applied to architectures ranging from linear models to transformers, this simple modification improves out-of-distribution performance by 12−15% without added computational cost. Critically, we underscore the need to focus on new data generation that improves the connectivity among annotated cell types. Our work suggests that this is likely to yield more generalizable algorithms than would solely increasing model complexity.

A hierarchical cross-entropy loss is presented, which incorporates ontology structure into training and improves the out-of-distribution performance of large-scale single-cell annotation models without additional computational cost.

## Full-text entities

- **Genes:** CD14 (CD14 molecule) [NCBI Gene 929], CD8A (CD8 subunit alpha) [NCBI Gene 925] {aka CD8, CD8alpha, IMD116, Leu2, p32}, CD4 (CD4 molecule) [NCBI Gene 920] {aka CD4mut, IMD79, Leu-3, OKT4D, T4}
- **Diseases:** CE (MESH:C537866)
- **Chemicals:** OOD (-)
- **Species:** Homo sapiens (human, species) [taxon 9606]

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13021517/full.md

## Figures

2 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13021517/full.md

## References

4 references — full list in the complete paper: https://tomesphere.com/paper/PMC13021517/full.md

---
Source: https://tomesphere.com/paper/PMC13021517