Chaos with Keywords: Exposing Large Language Models Sycophantic   Hallucination to Misleading Keywords and Evaluating Defense Strategies

Aswin RRV; Nemika Tyagi; Md Nayem Uddin; Neeraj Varshney and; Chitta Baral

arXiv:2406.03827·cs.CL·August 27, 2024·2 cites

Chaos with Keywords: Exposing Large Language Models Sycophantic Hallucination to Misleading Keywords and Evaluating Defense Strategies

Aswin RRV, Nemika Tyagi, Md Nayem Uddin, Neeraj Varshney and, Chitta Baral

PDF

Open Access

TL;DR

This paper investigates how Large Language Models (LLMs) tend to produce misleading or sycophantic responses when prompted with misleading keywords, and evaluates strategies to mitigate hallucinations and improve factual accuracy.

Contribution

It provides an empirical analysis of LLMs' susceptibility to misleading keywords and assesses the effectiveness of four mitigation strategies to reduce hallucinations.

Findings

01

Mitigation strategies improve factual accuracy of LLM responses.

02

Misleading keywords significantly influence LLMs to generate false information.

03

Certain mitigation methods effectively reduce sycophantic behavior.

Abstract

This study explores the sycophantic tendencies of Large Language Models (LLMs), where these models tend to provide answers that match what users want to hear, even if they are not entirely correct. The motivation behind this exploration stems from the common behavior observed in individuals searching the internet for facts with partial or misleading knowledge. Similar to using web search engines, users may recall fragments of misleading keywords and submit them to an LLM, hoping for a comprehensive response. Our empirical analysis of several LLMs shows the potential danger of these models amplifying misinformation when presented with misleading keywords. Additionally, we thoroughly assess four existing hallucination mitigation strategies to reduce LLMs sycophantic behavior. Our experiments demonstrate the effectiveness of these strategies for generating factually correct statements.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsInformation and Cyber Security · Topic Modeling