Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation

Kyubeen Han; Junseo Jang; Hongjin Kim; Geunyeong Jeong; Harksoo Kim

arXiv:2507.18203·cs.CL·July 25, 2025

Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation

Kyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim

PDF

Open Access 1 Video

TL;DR

This paper investigates how instruction-tuning affects large language models' tendency to accept misinformation, revealing increased susceptibility and emphasizing the need for mitigation strategies to improve reliability.

Contribution

It provides the first detailed analysis of how instruction-tuning influences LLMs' acceptance of misinformation, highlighting increased reliance on user input.

Findings

01

Instruction-tuned LLMs are more likely to accept misinformation from users.

02

Instruction-tuning shifts susceptibility from assistant to user role.

03

Factors like prompt structure and misinformation length affect susceptibility.

Abstract

Instruction-tuning enhances the ability of large language models (LLMs) to follow user instructions more accurately, improving usability while reducing harmful outputs. However, this process may increase the model's dependence on user input, potentially leading to the unfiltered acceptance of misinformation and the generation of hallucinations. Existing studies primarily highlight that LLMs are receptive to external information that contradict their parametric knowledge, but little research has been conducted on the direct impact of instruction-tuning on this phenomenon. In our study, we investigate the impact of instruction-tuning on LLM's susceptibility to misinformation. Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user. A comparison with base models shows that instruction-tuning increases reliance…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Exploring the Impact of Instruction-Tuning on LLM’s Susceptibility to Misinformation· underline

Taxonomy

TopicsHate Speech and Cyberbullying Detection