oRetrieval Augmented Generation for 10 Large Language Models and its Generalizability in Assessing Medical Fitness
Yu He Ke, Liyuan Jin, Kabilan Elangovan, Hairil Rizal Abdullah, Nan, Liu, Alex Tiong Heng Sia, Chai Rick Soh, Joshua Yi Min Tung, Jasmine Chiat, Ling Ong, Chang-Fu Kuo, Shao-Chun Wu, Vesela P. Kovacheva, Daniel Shu Wei, Ting

TL;DR
This study evaluates retrieval-augmented large language models for medical preoperative assessments, demonstrating high accuracy, speed, and consistency across guidelines, with GPT4 achieving 96.4% correctness and no hallucinations.
Contribution
It introduces LLM-RAG models tailored for medical preoperative tasks, showing their effectiveness and generalizability across diverse clinical guidelines.
Findings
GPT4 LLM-RAG achieved 96.4% accuracy in assessments.
Models responded within 20 seconds, faster than clinicians.
Responses were consistent and hallucination-free.
Abstract
Large Language Models (LLMs) show potential for medical applications but often lack specialized clinical knowledge. Retrieval Augmented Generation (RAG) allows customization with domain-specific information, making it suitable for healthcare. This study evaluates the accuracy, consistency, and safety of RAG models in determining fitness for surgery and providing preoperative instructions. We developed LLM-RAG models using 35 local and 23 international preoperative guidelines and tested them against human-generated responses. A total of 3,682 responses were evaluated. Clinical documents were processed using Llamaindex, and 10 LLMs, including GPT3.5, GPT4, and Claude-3, were assessed. Fourteen clinical scenarios were analyzed, focusing on seven aspects of preoperative instructions. Established guidelines and expert judgment were used to determine correct responses, with human-generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsArtificial Intelligence in Healthcare · Machine Learning in Healthcare · Online Learning and Analytics
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Dropout · Linear Layer · Weight Decay · WordPiece · Linear Warmup With Linear Decay · Dropout · Layer Normalization · Byte Pair Encoding · BERT
