A Systematic Study of Biomedical Retrieval Pipeline Trade-offs in Performance and Efficiency
Hayk Stepanyan, Matthew McDermott

TL;DR
This paper empirically analyzes biomedical retrieval pipeline choices, providing practical guidance on optimizing performance and efficiency across various datasets and query types.
Contribution
It offers systematic insights into retrieval pipeline design, highlighting effective corpus aggregation, indexing strategies, and chunking methods for biomedical information retrieval.
Findings
Corpus aggregation improves retrieval quality.
MedRAG/pubmed is Pareto-optimal for biomedical retrieval.
FAISS indexing offers favorable speed-efficiency trade-offs.
Abstract
Retrieval systems are increasingly used in biomedical and clinical natural language processing applications, yet practical guidance for researchers building such systems is limited. In this work, we provide such guidance through an empirical study of how retrieval pipeline design choices affect performance and efficiency at scale. In particular, we examine retrieval over a variety of existing, public biomedical text datasets, leveraging a variety of disparate types of queries, including exam-style questions, conversational medical queries, community-asked questions, and non-question formulations across various retrieval pipeline settings spanning corpus selection, chunk granularity, and vector index configuration. Retrieval results are judged using a robust, win-rate comparison assessment via an LLM-as-a-judge setting with human validation. Across these experiments, we identify…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
