Enhancing Large Language Models with Domain-specific Retrieval Augment Generation: A Case Study on Long-form Consumer Health Question Answering in Ophthalmology
Aidan Gilson, Xuguang Ai, Thilaka Arunachalam, Ziyou Chen, Ki Xiong, Cheong, Amisha Dave, Cameron Duic, Mercy Kibe, Annette Kaminaka, Minali, Prasad, Fares Siddig, Maxwell Singer, Wendy Wong, Qiao Jin, Tiarnan D.L., Keenan, Xia Hu, Emily Y. Chew, Zhiyong Lu, Hua Xu

TL;DR
This study demonstrates that retrieval-augmented generation significantly improves the factual accuracy and evidence attribution of large language models in ophthalmology-based consumer health question answering, addressing hallucination issues.
Contribution
It introduces a domain-specific RAG pipeline with 70,000 ophthalmology documents and systematically evaluates its impact on LLM responses in medical question answering.
Findings
RAG reduces hallucinated and erroneous evidence in LLM responses
RAG improves evidence attribution and answer accuracy
Top retrieved documents are frequently used as references in responses
Abstract
Despite the potential of Large Language Models (LLMs) in medicine, they may generate responses lacking supporting evidence or based on hallucinated evidence. While Retrieval Augment Generation (RAG) is popular to address this issue, few studies implemented and evaluated RAG in downstream domain-specific applications. We developed a RAG pipeline with 70,000 ophthalmology-specific documents that retrieve relevant documents to augment LLMs during inference time. In a case study on long-form consumer health questions, we systematically evaluated the responses including over 500 references of LLMs with and without RAG on 100 questions with 10 healthcare professionals. The evaluation focuses on factuality of evidence, selection and ranking of evidence, attribution of evidence, and answer accuracy and completeness. LLMs without RAG provided 252 references in total. Of which, 45.3%…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · Health Literacy and Information Accessibility
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Dense Connections · Multi-Head Attention · Linear Warmup With Linear Decay · Weight Decay · Adam · WordPiece
