LLM Lies: Hallucinations are not Bugs, but Features as Adversarial   Examples

Jia-Yu Yao; Kun-Peng Ning; Zhen-Hui Liu; Mu-Nan Ning; Yu-Yang Liu; Li; Yuan

arXiv:2310.01469·cs.CL·August 6, 2024·60 cites

LLM Lies: Hallucinations are not Bugs, but Features as Adversarial Examples

Jia-Yu Yao, Kun-Peng Ning, Zhen-Hui Liu, Mu-Nan Ning, Yu-Yang Liu, Li, Yuan

PDF

Open Access 1 Repo

TL;DR

This paper reveals that hallucinations in large language models are akin to adversarial examples, showing they can be triggered by nonsensical prompts and manipulated inputs, leading to new insights and defenses.

Contribution

It demonstrates that LLM hallucinations can be induced through adversarial prompts and input perturbations, framing hallucinations as adversarial examples and proposing a defense strategy.

Findings

01

Nonsensical prompts can trigger hallucinations in LLMs.

02

Transformers can be manipulated to produce specific tokens.

03

Hallucinations share properties with adversarial examples.

Abstract

Large Language Models (LLMs), including GPT-3.5, LLaMA, and PaLM, seem to be knowledgeable and able to adapt to many tasks. However, we still cannot completely trust their answers, since LLMs suffer from \textbf{hallucination}\textemdash fabricating non-existent facts, deceiving users with or without their awareness. However, the reasons for their existence and pervasiveness remain unclear. In this paper, we demonstrate that nonsensical prompts composed of random tokens can also elicit the LLMs to respond with hallucinations. Moreover, we provide both theoretical and experimental evidence that transformers can be manipulated to produce specific pre-define tokens by perturbing its input sequence. This phenomenon forces us to revisit that \emph{hallucination may be another view of adversarial examples}, and it shares similar characteristics with conventional adversarial examples as a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

pku-yuangroup/hallucination-attack
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Artificial Intelligence in Healthcare and Education

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · {Dispute@FaQ-s}How to file a dispute with Expedia? · Multi-Head Attention · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Linear Warmup With Cosine Annealing · Layer Normalization · Softmax · Byte Pair Encoding · Dropout