Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement   Method for Diverse Hallucinations Categories

Tianlong Wang; Xianfeng Jiao; Yinghao Zhu; Zhongzhi Chen; Yifan He; Xu; Chu; Junyi Gao; Yasha Wang; Liantao Ma

arXiv:2406.00034·cs.CL·February 27, 2025

Adaptive Activation Steering: A Tuning-Free LLM Truthfulness Improvement Method for Diverse Hallucinations Categories

Tianlong Wang, Xianfeng Jiao, Yinghao Zhu, Zhongzhi Chen, Yifan He, Xu, Chu, Junyi Gao, Yasha Wang, Liantao Ma

PDF

Open Access

TL;DR

The paper introduces Adaptive Activation Steering (ACT), a tuning-free method that enhances the truthfulness of large language models by adaptively shifting their activations during inference, effectively reducing hallucinations across various models and scales.

Contribution

We propose ACT, a novel tuning-free activation steering technique that encodes truthfulness as a linear concept and adaptively adjusts model activations to improve factual accuracy.

Findings

01

Significantly improves truthfulness in multiple LLMs (up to 142%).

02

Demonstrates scalability across models from 13B to 65B parameters.

03

Effective across diverse hallucination categories.

Abstract

Recent studies have indicated that Large Language Models (LLMs) harbor an inherent understanding of truthfulness, yet often fail to consistently express it and generate false statements. This gap between "knowing" and "telling" poses a challenge for ensuring the truthfulness of generated content. Inspired by recent work on the practice of encoding human-interpretable concepts linearly within large language models, we treat truthfulness as a specially linearly encoded concept within LLMs, and introduce Adaptive Activation Steering (ACT), a tuning-free method that adaptively shifts LLM's activations in the "truthful" direction during inference. ACT addresses diverse categories of hallucinations by utilizing diverse truthfulness-related steering vectors and adjusting the steering intensity adaptively. Applied as an add-on across various models, ACT significantly improves truthfulness in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMental Health and Psychiatry · Pain Management and Placebo Effect · Hallucinations in medical conditions