BioMiner: A Multi-modal System for Automated Mining of Protein-Ligand Bioactivity Data from Literature
Jiaxian Yan, Jintao Zhu, Yuhang Yang, Qi Liu, Kai Zhang, Zaixi Zhang, Xukai Liu, Boyan Zhang, Kaiyuan Gao, Jinchuan Xiao, Enhong Chen

TL;DR
BioMiner is a novel multi-modal system that automates extraction of protein-ligand bioactivity data from literature, combining semantic reasoning and chemical structure reconstruction to aid drug discovery.
Contribution
It introduces BioMiner, a multi-modal framework that separates semantic interpretation from structure construction, and establishes BioVista, a large benchmark for bioactivity data extraction.
Findings
Achieved an F1 score of 0.32 for bioactivity triplets.
Built a database of 82,262 entries from 11,683 papers, improving downstream models by 3.9%.
Accelerated bioactivity annotation with a 5.59-fold speed increase.
Abstract
Protein-ligand bioactivity data published in the literature are essential for drug discovery, yet manual curation struggles to keep pace with rapidly growing literature. Automated bioactivity extraction remains challenging because it requires not only interpreting biochemical semantics distributed across text, tables, and figures, but also reconstructing chemically exact ligand structures (e.g., Markush structures). To address this bottleneck, we introduce BioMiner, a multi-modal extraction framework that explicitly separates bioactivity semantic interpretation from ligand structure construction. Within BioMiner, bioactivity semantics are inferred through direct reasoning, while chemical structures are resolved via a chemical-structure-grounded visual semantic reasoning paradigm, in which multi-modal large language models operate on chemically grounded visual representations to infer…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
