Overview of TREC 2024 Biomedical Generative Retrieval (BioGen) Track
Deepak Gupta, Dina Demner-Fushman, William Hersh, Steven Bedrick, and, Kirk Roberts

TL;DR
The paper discusses the TREC 2024 Biomedical Generative Retrieval (BioGen) Track, focusing on evaluating and improving the grounding of large language models in reliable biomedical sources to reduce hallucinations and false information.
Contribution
It introduces a pilot task on reference attribution to help mitigate false statements generated by LLMs in biomedical question answering.
Findings
Highlighting the challenge of hallucinations in biomedical LLMs
Proposing reference attribution as a solution to improve factual grounding
Setting up evaluation approaches for biomedical LLM reliability
Abstract
With the advancement of large language models (LLMs), the biomedical domain has seen significant progress and improvement in multiple tasks such as biomedical question answering, lay language summarization of the biomedical literature, clinical note summarization, etc. However, hallucinations or confabulations remain one of the key challenges when using LLMs in the biomedical and other domains. Inaccuracies may be particularly harmful in high-risk situations, such as medical question answering, making clinical decisions, or appraising biomedical research. Studies on the evaluation of the LLMs abilities to ground generated statements in verifiable sources have shown that models perform significantly worse on lay-user-generated questions, and often fail to reference relevant sources. This can be problematic when those seeking information want evidence from studies to back up the claims…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBiomedical Text Mining and Ontologies
