# SummArIzeR: simplifying cross-database enrichment result clustering and annotation via large language models

**Authors:** Marie Brinkmann, Michael Bonelli, Anela Tosevska

PMC · DOI: 10.1093/bioinformatics/btag102 · Bioinformatics · 2026-02-28

## TL;DR

SummArIzeR is an R package that simplifies the interpretation of enrichment analysis results by clustering and annotating them using large language models.

## Contribution

The novel contribution is an R package that clusters enrichment results and uses large language models for unbiased, fast annotation.

## Key findings

- SummArIzeR clusters enrichment results based on shared genes and calculates pooled P-values for each cluster.
- The package offers cluster annotations via large language models, achieving results comparable to manual curation.
- SummArIzeR provides fast and intuitive visualization of enrichment analysis results.

## Abstract

Enrichment analysis across multiple databases often results in a high level of redundancy due to overlapping terms, complicating the interpretation of biological data. To address this, we developed SummArIzeR, an R package to cluster and annotate enrichment results across multiple databases, enabling fast, intuitive interpretation and comparison across multiple conditions. SummArIzeR clusters enrichment results based on shared genes, calculates a pooled P-value for each cluster and facilitates the cluster annotation using large-language models. It further allows an easily interpretable visualization of the results.

Compared to existing tools, SummArIzeR provides unbiased and fast cluster annotation using large language models. We demonstrate that SummArIzeR achieves clustering comparable to manual curation while offering superior grouping based on shared underlying genes.

The SummArIzeR package is available as an open-source R package, with a comprehensive user manual provided in its GitHub repository: https://github.com/bonellilab/SummArIzeR.

## Full-text entities

- **Diseases:** Renal cell carcinoma (MESH:D002292), carbon (MESH:D002249), Hypoxia (MESH:D000860), Cancer (MESH:D009369)
- **Chemicals:** glycosaminoglycan (MESH:D006025), calcium (MESH:D002118), LLM (-)

## Full text

_Full body text omitted from this summary view._ Fetch the complete paper as Markdown: https://tomesphere.com/paper/PMC13005729/full.md

## Figures

3 figures with captions in the complete paper: https://tomesphere.com/paper/PMC13005729/full.md

## References

27 references — full list in the complete paper: https://tomesphere.com/paper/PMC13005729/full.md

---
Source: https://tomesphere.com/paper/PMC13005729