PhageBench: Can LLMs Understand Raw Bacteriophage Genomes?
Yusen Hou, Weicai Long, Haitao Hu, Houcheng Su, Junning Feng, Yanlin Zhang

TL;DR
PhageBench is a new benchmark to evaluate large language models' ability to interpret raw bacteriophage genomes, revealing their strengths and limitations in biological reasoning tasks.
Contribution
This work introduces PhageBench, the first benchmark for assessing LLMs on phage genome understanding across multiple bioinformatics tasks.
Findings
LLMs outperform random baselines in phage contig identification.
LLMs show promise in host prediction tasks.
Significant limitations remain in complex reasoning and functional localization.
Abstract
Bacteriophages, often referred to as the dark matter of the biosphere, play a critical role in regulating microbial ecosystems and in antibiotic alternatives. Thus, accurate interpretation of their genomes holds significant scientific and practical value. While general-purpose Large Language Models (LLMs) excel at understanding biological texts, their ability to directly interpret raw nucleotide sequences and perform biological reasoning remains underexplored. To address this, we introduce PhageBench, the first benchmark designed to evaluate phage genome understanding by mirroring the workflow of bioinformatics experts. The dataset contains 5,600 high-quality samples covering five core tasks across three stages: Screening, Quality Control, and Phenotype Annotation. Our evaluation of eight LLMs reveals that general-purpose reasoning models significantly outperform random baselines in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
