Enhanced Language Model Truthfulness with Learnable Intervention and   Uncertainty Expression

Farima Fatahi Bayat; Xin Liu; H. V. Jagadish; Lu Wang

arXiv:2405.00301·cs.CL·June 10, 2024

Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression

Farima Fatahi Bayat, Xin Liu, H. V. Jagadish, Lu Wang

PDF

Open Access 1 Repo 1 Video

TL;DR

LITO is a learnable intervention method that adaptively enhances language model truthfulness by tuning intervention intensity based on context and uncertainty, improving factual accuracy without sacrificing task performance.

Contribution

The paper introduces LITO, a novel adaptive intervention technique that automatically adjusts intervention strength for truthfulness in language models, outperforming fixed approaches.

Findings

01

LITO improves factual accuracy across multiple LLMs and datasets.

02

LITO maintains task accuracy while increasing truthfulness.

03

Adaptive intervention outperforms fixed strategies in truthfulness enhancement.

Abstract

Large language models (LLMs) can generate long-form and coherent text, yet they often hallucinate facts, which undermines their reliability. To mitigate this issue, inference-time methods steer LLM representations toward the "truthful directions" previously learned for truth elicitation. However, applying these truthful directions with the same intensity fails to generalize across different query contexts. We propose LITO, a Learnable Intervention method for Truthfulness Optimization that automatically identifies the optimal intervention intensity tailored to each specific context. LITO explores a sequence of model generations based on increasing levels of intervention intensities. It selects the most accurate response or refuses to answer when the predictions are highly uncertain. Experiments on multiple LLMs and question-answering datasets demonstrate that LITO improves truthfulness…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

launchnlp/lito
pytorchOfficial

Videos

Enhanced Language Model Truthfulness with Learnable Intervention and Uncertainty Expression· underline

Taxonomy

TopicsReservoir Engineering and Simulation Methods