Non-Linear Inference Time Intervention: Improving LLM Truthfulness

Jakub Hoscilowicz; Adam Wiacek; Jan Chojnacki; Adam Cieslak; Leszek; Michon; Vitalii Urbanevych; Artur Janicki

arXiv:2403.18680·cs.CL·June 7, 2024·1 cites

Non-Linear Inference Time Intervention: Improving LLM Truthfulness

Jakub Hoscilowicz, Adam Wiacek, Jan Chojnacki, Adam Cieslak, Leszek, Michon, Vitalii Urbanevych, Artur Janicki

PDF

Open Access 1 Repo

TL;DR

This paper introduces Non-Linear ITI, a novel method for improving LLM truthfulness by biasing internal representations without fine-tuning, leading to significant accuracy gains on multiple-choice benchmarks.

Contribution

The paper proposes Non-Linear ITI, a new non-linear multi-token intervention technique that enhances LLM truthfulness without requiring model fine-tuning.

Findings

01

Over 16% relative accuracy improvement on TruthfulQA

02

10% relative improvement over Truth Forest method

03

Effective biasing of internal representations enhances truthfulness

Abstract

In this work, we explore LLM's internal representation space to identify attention heads that contain the most truthful and accurate information. We further developed the Inference Time Intervention (ITI) framework, which lets bias LLM without the need for fine-tuning. The improvement manifests in introducing a non-linear multi-token probing and multi-token intervention: Non-Linear ITI (NL-ITI), which significantly enhances performance on evaluation benchmarks. NL-ITI is tested on diverse multiple-choice datasets, including TruthfulQA, on which we report over 16% relative MC1 (accuracy of model pointing to the correct answer) improvement with respect to the baseline ITI results. Moreover, we achieved a 10% relative improvement over the recently released Truth Forest (TrFf) method that also focused on ITI improvement.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

samsung/nl-iti
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSensor Technology and Measurement Systems