# PuMA: PubMed gene/cell type-relation Atlas

**Authors:** Lucas Bickmann, Sarah Sandmann, Carolin Walter, Julian Varghese

PMC · DOI: 10.1186/s12859-025-06236-8 · BMC Bioinformatics · 2025-07-29

## TL;DR

PuMA is a tool that helps researchers automatically annotate cell types using gene expression data from PubMed articles, with interactive visualizations and competitive performance against curated databases.

## Contribution

PuMA introduces a novel, automated system for cell type annotation using PubMed data, with a local web interface and competitive accuracy against manual databases.

## Key findings

- PuMA performs competitively against manual databases on mouse and human datasets.
- The tool provides traceable gene-to-cell type relations based on PubMed articles.
- Interactive graph visualizations and search tools enhance exploration of gene-cell relationships.

## Abstract

Rapid extraction and visualization of cell-specific gene expression is important for automatic cell type annotation, e.g. in single cell analysis. There is an emerging field in which tools such as curated databases or machine learning methods are used to support cell type annotation. However, complementing approaches to efficiently incorporate the latest knowledge of free-text articles from literature databases, such as PubMed, are understudied.

This work introduces the PubMed Gene/Cell type-Relation Atlas (PuMA) which provides a local, easy-to-use web-interface to facilitate literature-driven cell type annotation. It utilizes a pretrained machine learning based named entity recognition model in order to extract gene and cell type concepts from PubMed, links biomedical ontologies, and suggests gene to cell type relations based on a ranking score. It includes a search tool for genes and cell types, additionally providing an interactive graph visualization for exploring cross-relations. Each result is fully traceable by linking the relevant PubMed articles.

This work enables researchers to analyse and automatize cell type annotation based on PubMed articles. It complements manual curated marker gene databases and enables interactive visualizations. The evaluation shows that PuMA is competitive against an extensive manual curated database across three gold standard datasets and two species—mouse and human. The software framework is freely available and enables regular article imports for incremental knowledge updates.GitLab: https://imigitlab.uni-muenster.de/published/PuMA/

## Linked entities

- **Species:** Mus musculus (taxon 10090), Homo sapiens (taxon 9606)

## Full-text entities

- **Genes:** Slc8a1 (solute carrier family 8 (sodium/calcium exchanger), member 1) [NCBI Gene 20541] {aka D930008O12Rik, Ncx1}, F3 (coagulation factor III, tissue factor) [NCBI Gene 2152] {aka CD142, TF, TFA}, WT1 (WT1 transcription factor) [NCBI Gene 7490] {aka AWT1, GUD, NPHS4, WAGR, WIT-2, WT-1}, A1BG (alpha-1-B glycoprotein) [NCBI Gene 1] {aka A1B, ABG, GAB, HYST2477}, Bbc3 (BCL2 binding component 3) [NCBI Gene 170770] {aka PUMA, PUMA/JFY1}, BBC3 (BCL2 binding component 3) [NCBI Gene 27113] {aka JFY-1, JFY1, PUMA}, OLIG2 (oligodendrocyte transcription factor 2) [NCBI Gene 10215] {aka BHLHB1, OLIGO2, PRKCBP2, RACK17, bHLHe19}, WTIP (WT1 interacting protein) [NCBI Gene 126374], Nppb (natriuretic peptide type B) [NCBI Gene 18158] {aka BNF, BNP, Iso-ANP}
- **Diseases:** PMC (MESH:D020210)
- **Chemicals:** CM2 (-)
- **Species:** Mus musculus (house mouse, species) [taxon 10090], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** CM2 — Homo sapiens (Human), Colon carcinoma, Cancer cell line (CVCL_A628)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC12308971/full.md

## Figures

6 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12308971/full.md

## References

3 references — full list in the complete paper: https://tomesphere.com/paper/PMC12308971/full.md

---
Source: https://tomesphere.com/paper/PMC12308971