Ontology-Guided Query Expansion for Biomedical Document Retrieval using Large Language Models
Zabir Al Nazi, Vagelis Hristidis, Aaron Lawson McLean, Jannat Ara Meem, Md Taukir Azam Chowdhury

TL;DR
This paper introduces BMQExpander, an ontology-guided query expansion method that leverages UMLS knowledge and large language models to improve biomedical document retrieval, outperforming existing baselines in accuracy and robustness.
Contribution
The paper presents a novel ontology-aware query expansion pipeline combining UMLS knowledge with LLMs, demonstrating superior retrieval performance and robustness in biomedical IR tasks.
Findings
Up to 22.1% improvement in NDCG@10 over sparse baselines
Up to 6.5% improvement over the strongest baseline
Robust generalization under query perturbation settings
Abstract
Effective Question Answering (QA) on large biomedical document collections requires effective document retrieval techniques. The latter remains a challenging task due to the domain-specific vocabulary and semantic ambiguity in user queries. We propose BMQExpander, a novel ontology-aware query expansion pipeline that combines medical knowledge - definitions and relationships - from the UMLS Metathesaurus with the generative capabilities of large language models (LLMs) to enhance retrieval effectiveness. We implemented several state-of-the-art baselines, including sparse and dense retrievers, query expansion methods, and biomedical-specific solutions. We show that BMQExpander has superior retrieval performance on three popular biomedical Information Retrieval (IR) benchmarks: NFCorpus, TREC-COVID, and SciFact - with improvements of up to 22.1% in NDCG@10 over sparse baselines and up to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSemantic Web and Ontologies · Biomedical Text Mining and Ontologies
