PlasRAG: comprehensive plasmid characterization and retrieval through sequence-text alignment
Yongxin Ji, Jiaojiao Guan, Herui Liao, Jiayu Shang, Yanni Sun

TL;DR
PlasRAG is a new tool that helps analyze and retrieve plasmid DNA by combining sequence and text data, improving understanding of drug-resistant bacteria.
Contribution
PlasRAG introduces a bidirectional multi-modal model for plasmid characterization and retrieval using sequence-text alignment.
Findings
PlasRAG integrates multi-faceted property characterization of plasmids.
The tool uses a sequence-text alignment model to overcome traditional limitations.
Experiments show PlasRAG's robust performance and enhanced analytical capabilities.
Abstract
Plasmids play a pivotal role in the emergence of multidrug-resistant and pathogenic bacteria, posing significant clinical challenges. However, the rapidly growing number of unannotated plasmids necessitates comprehensive characterization of their diverse properties. Here, we present PlasRAG, a tool that integrates multi-faceted property characterization of query plasmids and plasmid DNA retrieval based on textual queries. PlasRAG employs a bidirectional multi-modal information retrieval model that aligns DNA sequences with textual data, effectively overcoming the limitations of traditional approaches. Rigorous experiments demonstrate that PlasRAG delivers robust performance and enhanced analytical capabilities, underscoring the effectiveness of its architectural design. The online version contains supplementary material available at 10.1186/s13059-026-03966-7.
Genes, proteins, chemicals, diseases, species, mutations and cell lines named across the full text — each resolved to its canonical identifier and authoritative record.
Click any figure to enlarge with its caption.
Figure 1
Figure 2
Figure 3
Figure 4
Figure 5
Figure 6
Figure 7
Figure 8
Figure 9Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Machine Learning in Bioinformatics · Topic Modeling
