Exploring the Impact of Instruction-Tuning on LLM's Susceptibility to Misinformation
Kyubeen Han, Junseo Jang, Hongjin Kim, Geunyeong Jeong, Harksoo Kim

TL;DR
This paper investigates how instruction-tuning affects large language models' tendency to accept misinformation, revealing increased susceptibility and emphasizing the need for mitigation strategies to improve reliability.
Contribution
It provides the first detailed analysis of how instruction-tuning influences LLMs' acceptance of misinformation, highlighting increased reliance on user input.
Findings
Instruction-tuned LLMs are more likely to accept misinformation from users.
Instruction-tuning shifts susceptibility from assistant to user role.
Factors like prompt structure and misinformation length affect susceptibility.
Abstract
Instruction-tuning enhances the ability of large language models (LLMs) to follow user instructions more accurately, improving usability while reducing harmful outputs. However, this process may increase the model's dependence on user input, potentially leading to the unfiltered acceptance of misinformation and the generation of hallucinations. Existing studies primarily highlight that LLMs are receptive to external information that contradict their parametric knowledge, but little research has been conducted on the direct impact of instruction-tuning on this phenomenon. In our study, we investigate the impact of instruction-tuning on LLM's susceptibility to misinformation. Our analysis reveals that instruction-tuned LLMs are significantly more likely to accept misinformation when it is presented by the user. A comparison with base models shows that instruction-tuning increases reliance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsHate Speech and Cyberbullying Detection
