Graph-Based Retriever Captures the Long Tail of Biomedical Knowledge
Julien Delile, Srayanta Mukherjee, Anton Van Pamel, Leonid Zhukov

TL;DR
This paper introduces a knowledge graph-based retrieval method to better capture rare biomedical knowledge, improving retrieval performance and enhancing biomedical question-answering by combining it with existing embedding similarity approaches.
Contribution
A novel knowledge graph-driven retrieval technique is proposed to address the long-tail knowledge capture issue in biomedical literature, outperforming traditional similarity-based methods.
Findings
Knowledge graph retrieval doubles performance over embedding similarity.
Hybrid models outperform individual retrieval methods.
Improved retrieval enhances biomedical question-answering capabilities.
Abstract
Large language models (LLMs) are transforming the way information is retrieved with vast amounts of knowledge being summarized and presented via natural language conversations. Yet, LLMs are prone to highlight the most frequently seen pieces of information from the training set and to neglect the rare ones. In the field of biomedical research, latest discoveries are key to academic and industrial actors and are obscured by the abundance of an ever-increasing literature corpus (the information overload problem). Surfacing new associations between biomedical entities, e.g., drugs, genes, diseases, with LLMs becomes a challenge of capturing the long-tail knowledge of the biomedical scientific production. To overcome this challenge, Retrieval Augmented Generation (RAG) has been proposed to alleviate some of the shortcomings of LLMs by augmenting the prompts with context retrieved from…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies · Genetics, Bioinformatics, and Biomedical Research · Bioinformatics and Genomic Networks
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Sparse Evolutionary Training · WordPiece · Linear Warmup With Linear Decay · Dropout · Linear Layer · Weight Decay · Byte Pair Encoding · Attention Dropout · Dense Connections
