BMRetriever: Tuning Large Language Models as Better Biomedical Text Retrievers
Ran Xu, Wenqi Shi, Yue Yu, Yuchen Zhuang, Yanqiao Zhu, May D. Wang,, Joyce C. Ho, Chao Zhang, Carl Yang

TL;DR
BMRetriever introduces a series of dense biomedical retrievers trained with unsupervised and instruction fine-tuning, significantly improving biomedical retrieval tasks with high parameter efficiency and broad applicability.
Contribution
The paper presents BMRetriever, a novel biomedical retrieval model series that leverages unsupervised pre-training and instruction fine-tuning, achieving high performance with fewer parameters.
Findings
BMRetriever outperforms larger baseline models by up to 11.7 times.
The 2B variant matches performance of models with over 5B parameters.
Models are publicly available for reproducibility and further research.
Abstract
Developing effective biomedical retrieval models is important for excelling at knowledge-intensive biomedical tasks but still challenging due to the deficiency of sufficient publicly annotated biomedical data and computational resources. We present BMRetriever, a series of dense retrievers for enhancing biomedical retrieval via unsupervised pre-training on large biomedical corpora, followed by instruction fine-tuning on a combination of labeled datasets and synthetic pairs. Experiments on 5 biomedical tasks across 11 datasets verify BMRetriever's efficacy on various biomedical applications. BMRetriever also exhibits strong parameter efficiency, with the 410M variant outperforming baselines up to 11.7 times larger, and the 2B variant matching the performance of models with over 5B parameters. The training data and model checkpoints are released at \url{https://huggingface.co/BMRetriever}…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Topic Modeling
