Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH)   -- a Large Language Model Chatbot for Perioperative Medicine

Yu He Ke; Liyuan Jin; Kabilan Elangovan; Bryan Wen Xi Ong; Chin Yang; Oh; Jacqueline Sim; Kenny Wei-Tsen Loh; Chai Rick Soh; Jonathan Ming Hua; Cheng; Aaron Kwang Yang Lee; Daniel Shu Wei Ting; Nan Liu; and Hairil Rizal; Abdullah

arXiv:2412.18096·cs.AI·December 25, 2024

Real-world Deployment and Evaluation of PErioperative AI CHatbot (PEACH) -- a Large Language Model Chatbot for Perioperative Medicine

Yu He Ke, Liyuan Jin, Kabilan Elangovan, Bryan Wen Xi Ong, Chin Yang, Oh, Jacqueline Sim, Kenny Wei-Tsen Loh, Chai Rick Soh, Jonathan Ming Hua, Cheng, Aaron Kwang Yang Lee, Daniel Shu Wei Ting, Nan Liu, and Hairil Rizal, Abdullah

PDF

Open Access

TL;DR

PEACH, a large language model-based chatbot integrated with local perioperative guidelines, was developed and evaluated in real-world clinical settings, demonstrating high accuracy, safety, and usability for supporting perioperative decision-making.

Contribution

This study presents the first deployment and evaluation of a secure LLM chatbot tailored for perioperative medicine, integrating institutional protocols and assessing real-world clinical performance.

Findings

01

Achieved 97.5% initial accuracy, improved to 97.9% after updates.

02

Minimal hallucinations and deviations observed, both below 2%.

03

Clinicians reported decision support expedited in 95% of cases.

Abstract

Large Language Models (LLMs) are emerging as powerful tools in healthcare, particularly for complex, domain-specific tasks. This study describes the development and evaluation of the PErioperative AI CHatbot (PEACH), a secure LLM-based system integrated with local perioperative guidelines to support preoperative clinical decision-making. PEACH was embedded with 35 institutional perioperative protocols in the secure Claude 3.5 Sonet LLM framework within Pair Chat (developed by Singapore Government) and tested in a silent deployment with real-world data. Accuracy, safety, and usability were assessed. Deviations and hallucinations were categorized based on potential harm, and user feedback was evaluated using the Technology Acceptance Model (TAM). Updates were made after the initial silent deployment to amend one protocol. In 240 real-world clinical iterations, PEACH achieved a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education