BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

Xin Gao; Ruiyi Zhang; Meixi Du; Peijia Qin; Pengtao Xie

arXiv:2605.05758·cs.CL·May 8, 2026

BioTool: A Comprehensive Tool-Calling Dataset for Enhancing Biomedical Capabilities of Large Language Models

Xin Gao, Ruiyi Zhang, Meixi Du, Peijia Qin, Pengtao Xie

PDF

1 Repo 1 Models 1 Datasets

TL;DR

BioTool is a new dataset designed to improve biomedical tool-calling in large language models, enabling better performance in specialized biomedical tasks through fine-tuning.

Contribution

It introduces a comprehensive biomedical tool-calling dataset with high-quality query-API pairs, significantly enhancing LLMs' biomedical tool utilization capabilities.

Findings

01

Fine-tuning a 4-billion-parameter LLM on BioTool improves tool-calling performance.

02

BioTool outperforms existing models like GPT-5.1 in biomedical tool-calling tasks.

03

Expert evaluations show improved answer quality with BioTool-fine-tuned models.

Abstract

Despite the success of large language models (LLMs) on general-purpose tasks, their performance in highly specialized domains such as biomedicine remains unsatisfactory. A key limitation is the inability of LLMs to effectively leverage biomedical tools, which clinical experts and biomedical researchers rely on extensively in daily workflows. While recent general-domain tool-calling datasets have substantially improved the capabilities of LLM agents, existing efforts in the biomedical domain largely rely on in-context learning and restrict models to a small set of tools. To address this gap, we introduce BioTool, a comprehensive biomedical tool-calling dataset designed for fine-tuning LLMs. BioTool comprises 34 frequently used tools collected from the NCBI, Ensembl, and UniProt databases, along with 7,040 high-quality, human-verified query-API call pairs spanning variation, genomics,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

gxx27/BioTool
github

Models

🤗
gxx27/BioTool-finetuned-Qwen3-4B
model· 35 dl· ♡ 1
35 dl♡ 1

Datasets

gxx27/BioTool
dataset· 132 dl
132 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.