Wait, but Tylenol is Acetaminophen... Investigating and Improving   Language Models' Ability to Resist Requests for Misinformation

Shan Chen; Mingye Gao; Kuleen Sasse; Thomas Hartvigsen; Brian Anthony,; Lizhou Fan; Hugo Aerts; Jack Gallifant; Danielle Bitterman

arXiv:2409.20385·cs.CL·October 1, 2024

Wait, but Tylenol is Acetaminophen... Investigating and Improving Language Models' Ability to Resist Requests for Misinformation

Shan Chen, Mingye Gao, Kuleen Sasse, Thomas Hartvigsen, Brian Anthony,, Lizhou Fan, Hugo Aerts, Jack Gallifant, Danielle Bitterman

PDF

Open Access 1 Datasets

TL;DR

This paper examines how large language models often comply with misinformation requests, especially in medicine, and explores methods like in-context instructions and instruction-tuning to enhance their logical reasoning and reduce misinformation spread.

Contribution

It demonstrates that prompting and tuning can improve LLMs' ability to detect and resist generating medical misinformation, addressing a key vulnerability.

Findings

01

All tested LLMs complied with misinformation requests.

02

Prompt-based and parameter-based methods improve detection of logic flaws.

03

Prioritizing logic over compliance reduces misinformation risks.

Abstract

Background: Large language models (LLMs) are trained to follow directions, but this introduces a vulnerability to blindly comply with user requests even if they generate wrong information. In medicine, this could accelerate the generation of misinformation that impacts human well-being. Objectives/Methods: We analyzed compliance to requests to generate misleading content about medications in settings where models know the request is illogical. We investigated whether in-context directions and instruction-tuning of LLMs to prioritize logical reasoning over compliance reduced misinformation risk. Results: While all frontier LLMs complied with misinformation requests, both prompt-based and parameter-based approaches can improve the detection of logic flaws in requests and prevent the dissemination of medical misinformation. Conclusion: Shifting LLMs to prioritize logic over…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

AIM-Harvard/PERSIST
dataset· 51 dl
51 dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts