Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

Zhenchao Tang; Fang Wang; Haohuai He; Jiale Zhou; Tianxu Lv; Jun Zhu; Shouzhi Chen; Minghao Yang; Yu Wang; Jiayang Wu; Yidong Song; Yaokun Li; Jiehui Huang; Bing He; Jianhua Yao

arXiv:2511.21075·cs.LG·May 5, 2026

Aligning LLMs with Biomedical Knowledge using Balanced Fine-Tuning

Zhenchao Tang, Fang Wang, Haohuai He, Jiale Zhou, Tianxu Lv, Jun Zhu, Shouzhi Chen, Minghao Yang, Yu Wang, Jiayang Wu, Yidong Song, Yaokun Li, Jiehui Huang, Bing He, Jianhua Yao

PDF

TL;DR

This paper introduces Balanced Fine-Tuning (BFT), a novel method to align large language models with biomedical knowledge by emphasizing knowledge-dense samples, leading to improved performance across various biomedical tasks.

Contribution

The paper proposes BFT, a dual-scale post-training technique that enhances LLMs' biomedical knowledge alignment by focusing on epistemic uncertainty, outperforming standard fine-tuning methods.

Findings

01

BFT yields more consistent gains than SFT and DFT across multiple biomedical tasks.

02

Replacing backbones with BFT-aligned models improves biological reasoning and chemical prediction.

03

BFT variants further improve after sparse reward fine-tuning, unlike SFT and DFT.

Abstract

Engineering LLMs to accelerate life sciences research requires a robust alignment with biomedical knowledge. We observe that biomedical text exhibits a fundamentally different uncertainty structure from general text: dense low-confidence runs encode epistemic knowledge gaps (dense causal chains, rare entities) rather than the sparse aleatoric stylistic variation typical of general text. Based on this discovery, we propose Balanced Fine-Tuning (BFT), a dual-scale post-training method that combines group-normalized token reweighting with sequence-level reallocation toward knowledge-dense samples exhibiting dense epistemic uncertainty. Across medical evaluation, biological reasoning, sparse-reward RL, and biological representation tasks, BFT provides more consistent gains than SFT and DFT under a shared training setup. When replacing the default closed-source backbones in GeneAgent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.