Non-Linear Inference Time Intervention: Improving LLM Truthfulness
Jakub Hoscilowicz, Adam Wiacek, Jan Chojnacki, Adam Cieslak, Leszek, Michon, Vitalii Urbanevych, Artur Janicki

TL;DR
This paper introduces Non-Linear ITI, a novel method for improving LLM truthfulness by biasing internal representations without fine-tuning, leading to significant accuracy gains on multiple-choice benchmarks.
Contribution
The paper proposes Non-Linear ITI, a new non-linear multi-token intervention technique that enhances LLM truthfulness without requiring model fine-tuning.
Findings
Over 16% relative accuracy improvement on TruthfulQA
10% relative improvement over Truth Forest method
Effective biasing of internal representations enhances truthfulness
Abstract
In this work, we explore LLM's internal representation space to identify attention heads that contain the most truthful and accurate information. We further developed the Inference Time Intervention (ITI) framework, which lets bias LLM without the need for fine-tuning. The improvement manifests in introducing a non-linear multi-token probing and multi-token intervention: Non-Linear ITI (NL-ITI), which significantly enhances performance on evaluation benchmarks. NL-ITI is tested on diverse multiple-choice datasets, including TruthfulQA, on which we report over 16% relative MC1 (accuracy of model pointing to the correct answer) improvement with respect to the baseline ITI results. Moreover, we achieved a 10% relative improvement over the recently released Truth Forest (TrFf) method that also focused on ITI improvement.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSensor Technology and Measurement Systems
