BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models
Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie

TL;DR
BioTool is a new dataset designed to improve biomedical tool-calling in large language models, enabling better performance in specialized biomedical tasks through fine-tuning.
Contribution
It introduces a comprehensive biomedical tool-calling dataset with high-quality query-API pairs, significantly enhancing LLMs' biomedical tool utilization capabilities.
Findings
Fine-tuning a 4-billion-parameter LLM on BioTool improves tool-calling performance.
BioTool outperforms existing models like GPT-5.1 in biomedical tool-calling tasks.
Expert evaluations show improved answer quality with BioTool-fine-tuned models.
Abstract
Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain largely rely on in-context learning and restrict models to a small set of tools. To address this gap, we introduce BioTool, a comprehensive biomedical tool-calling dataset designed for fine-tuning LLMs. BioTool comprises 34 frequently used tools collected from the NCBI, Ensembl, and UniProt databases, along with 7,040 high-quality, human-verified query-API call pairs spanning variation, genomics,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
