Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Katie Kang; Eric Wallace; Claire Tomlin; Aviral Kumar; Sergey Levine

arXiv:2403.05612·cs.LG·May 30, 2024·1 cites

Unfamiliar Finetuning Examples Control How Language Models Hallucinate

Katie Kang, Eric Wallace, Claire Tomlin, Aviral Kumar, Sergey Levine

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper investigates how unfamiliar finetuning examples influence language model hallucinations and demonstrates that controlling these examples can reduce hallucinations and improve factuality in generated content.

Contribution

It reveals that unfamiliar finetuning examples shape hallucinations and proposes methods to control these effects, enhancing model reliability and factual accuracy.

Findings

01

Unfamiliar finetuning examples influence hallucination patterns.

02

Controlling hallucinations in reward models improves factuality.

03

Strategic supervision reduces negative effects of hallucinations in RL finetuning.

Abstract

Large language models are known to hallucinate when faced with unfamiliar queries, but the underlying mechanism that govern how models hallucinate are not yet fully understood. In this work, we find that unfamiliar examples in the models' finetuning data -- those that introduce concepts beyond the base model's scope of knowledge -- are crucial in shaping these errors. In particular, we find that an LLM's hallucinated predictions tend to mirror the responses associated with its unfamiliar finetuning examples. This suggests that by modifying how unfamiliar finetuning examples are supervised, we can influence a model's responses to unfamiliar queries (e.g., say ``I don't know''). We empirically validate this observation in a series of controlled experiments involving SFT, RL, and reward model finetuning on TriviaQA and MMLU. Our work further investigates RL finetuning strategies for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

katiekang1998/llm_hallucinations
pytorchOfficial

Videos

Unfamiliar Finetuning Examples Control How Language Models Hallucinate· underline

Taxonomy

TopicsNatural Language Processing Techniques

MethodsBalanced Selection · Shrink and Fine-Tune