# PlasRAG: comprehensive plasmid characterization and retrieval through sequence-text alignment

**Authors:** Yongxin Ji, Jiaojiao Guan, Herui Liao, Jiayu Shang, Yanni Sun

PMC · DOI: 10.1186/s13059-026-03966-7 · 2026-02-07

## TL;DR

PlasRAG is a new tool that helps analyze and retrieve plasmid DNA by combining sequence and text data, improving understanding of drug-resistant bacteria.

## Contribution

PlasRAG introduces a bidirectional multi-modal model for plasmid characterization and retrieval using sequence-text alignment.

## Key findings

- PlasRAG integrates multi-faceted property characterization of plasmids.
- The tool uses a sequence-text alignment model to overcome traditional limitations.
- Experiments show PlasRAG's robust performance and enhanced analytical capabilities.

## Abstract

Plasmids play a pivotal role in the emergence of multidrug-resistant and pathogenic bacteria, posing significant clinical challenges. However, the rapidly growing number of unannotated plasmids necessitates comprehensive characterization of their diverse properties. Here, we present PlasRAG, a tool that integrates multi-faceted property characterization of query plasmids and plasmid DNA retrieval based on textual queries. PlasRAG employs a bidirectional multi-modal information retrieval model that aligns DNA sequences with textual data, effectively overcoming the limitations of traditional approaches. Rigorous experiments demonstrate that PlasRAG delivers robust performance and enhanced analytical capabilities, underscoring the effectiveness of its architectural design.

The online version contains supplementary material available at 10.1186/s13059-026-03966-7.

## Full-text entities

- **Genes:** ABL2 (ABL proto-oncogene 2, non-receptor tyrosine kinase) [NCBI Gene 27] {aka ABLL, ARG}
- **Diseases:** DL (MESH:D007859), LLM (MESH:D007806), AMR (MESH:D060467), VF (MESH:C537182), multidrug (MESH:D018088), infectious diseases (MESH:D003141), ESM-2 (MESH:D020803), PS (MESH:D015619), bacterial disease (MESH:D001424), hallucination (MESH:D006212)
- **Chemicals:** Sulfonamide (MESH:D013449), Sulfone (MESH:D013450), heavy metal (MESH:D019216), AAs (MESH:D000596), CPU (-)
- **Species:** Enterobacteriaceae (enterobacteria, family) [taxon 543], Homo sapiens (human, species) [taxon 9606]
- **Cell lines:** ESM-2 — Homo sapiens (Human), Transformed cell line (CVCL_XI05), S2T — Drosophila melanogaster (Fruit fly), Spontaneously immortalized cell line (CVCL_Z232)

## Figures

9 figures with captions in the complete paper: https://tomesphere.com/paper/PMC12977624/full.md

---
Source: https://tomesphere.com/paper/PMC12977624