GGBound: A Genome-Grounded Agent for Microbial Life-Boundary Prediction
Hanbo Huang, Xuan Gong, Jing Wang, Lei Bai, Xiang Xiao, Weishu Zhao, Shiyu Liang

TL;DR
GGBound is a genome-conditioned language model that predicts microbial life boundaries by integrating genomic data, environmental factors, and metabolic tools, outperforming larger models in this specialized task.
Contribution
This work introduces a novel genome-to-physiology prediction framework using a tool-augmented LLM with a unique counterfactual reward, bridging the genotype-phenotype gap.
Findings
The 4B-parameter GGBound matches or exceeds larger LLMs in microbial boundary prediction.
Genome-token fusion and dynamic tool use significantly improve model performance.
The counterfactual gene-grounding reward enhances causal understanding of genomic influence.
Abstract
Characterizing the physiological life boundaries of microbial strains, including viable temperature, pH, salinity, substrate utilization, and morphology, is central to biotechnology and ecology, yet traditionally requires exhaustive in vitro screening. Existing computational approaches either treat physiological traits as isolated supervised targets or repurpose biological foundation models as static encoders, leaving the genotype-to-physiology gap largely unbridged. We formulate microbial life-boundary prediction as a unified genome-to-physiology task and address it with a genome-conditioned, tool-augmented LLM agent. To support this task, we curate a strain-centric benchmark from IJSEM, NCBI, and BacDive covering 1,525 strains and 6,448 instances across viability intervals, environmental optima, substrate utilization, categorical traits, and morphology. Architecturally, the agent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
