Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li, Oam Patel, Fernanda Vi\'egas, Hanspeter Pfister, Martin, Wattenberg

TL;DR
This paper presents Inference-Time Intervention (ITI), a minimally invasive method that shifts model activations during inference to significantly improve the truthfulness of large language models, with minimal data and computational costs.
Contribution
The paper introduces ITI, a novel technique for enhancing LLM truthfulness during inference by adjusting activations, requiring minimal data and computational resources.
Findings
ITI improves LLaMA's truthfulness on TruthfulQA from 32.5% to 65.1%.
ITI is minimally invasive and computationally inexpensive.
A tradeoff exists between truthfulness and helpfulness, tunable via intervention strength.
Abstract
We introduce Inference-Time Intervention (ITI), a technique designed to enhance the "truthfulness" of large language models (LLMs). ITI operates by shifting model activations during inference, following a set of directions across a limited number of attention heads. This intervention significantly improves the performance of LLaMA models on the TruthfulQA benchmark. On an instruction-finetuned LLaMA called Alpaca, ITI improves its truthfulness from 32.5% to 65.1%. We identify a tradeoff between truthfulness and helpfulness and demonstrate how to balance it by tuning the intervention strength. ITI is minimally invasive and computationally inexpensive. Moreover, the technique is data efficient: while approaches like RLHF require extensive annotations, ITI locates truthful directions using only few hundred examples. Our findings suggest that LLMs may have an internal representation of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
- 🤗likenneth/honest_llama2_chat_7Bmodel· 14 dl· ♡ 914 dl♡ 9
- 🤗jujipotle/honest_llama_7Bmodel· 1 dl1 dl
- 🤗jujipotle/honest_llama2_chat_7Bmodel· 4 dl4 dl
- 🤗jujipotle/honest_llama2_chat_70Bmodel
- 🤗jujipotle/honest_llama3_8B_instructmodel· 13 dl· ♡ 213 dl♡ 2
- 🤗jujipotle/honest_llama3_70B_instructmodel· 1 dl· ♡ 11 dl♡ 1
- 🤗syed-aliredha/honest_llama3.1_8B_instructmodel
- 🤗syed-aliredha/creativity-iti-llama31-8bmodel
Videos
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
