Challenges of GPT-3-based Conversational Agents for Healthcare

Fabian Lechner; Allison Lahnala; Charles Welch; Lucie Flek

arXiv:2308.14641·cs.CL·August 30, 2023

Challenges of GPT-3-based Conversational Agents for Healthcare

Fabian Lechner, Allison Lahnala, Charles Welch, Lucie Flek

PDF

Open Access

TL;DR

This paper examines the limitations and risks of using GPT-3-based models in medical question-answering systems, highlighting issues like inaccurate responses and unsafe recommendations through stress-testing procedures.

Contribution

It introduces a manual stress-testing procedure to evaluate GPT-3's performance in high-risk medical queries and analyzes the potential safety and accuracy concerns.

Findings

01

LLMs generate erroneous medical information

02

LLMs produce unsafe recommendations

03

Content may be offensive or inappropriate

Abstract

The potential to provide patients with faster information access while allowing medical specialists to concentrate on critical tasks makes medical domain dialog agents appealing. However, the integration of large-language models (LLMs) into these agents presents certain limitations that may result in serious consequences. This paper investigates the challenges and risks of using GPT-3-based models for medical question-answering (MedQA). We perform several evaluations contextualized in terms of standard medical principles. We provide a procedure for manually designing patient queries to stress-test high-risk limitations of LLMs in MedQA systems. Our analysis reveals that LLMs fail to respond adequately to these queries, generating erroneous medical information, unsafe recommendations, and content that may be considered offensive.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Artificial Intelligence in Healthcare and Education · AI in Service Interactions

Methodsfail