Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng; Ruidi Chang; Hanjie Chen

arXiv:2507.05158·cs.CL·October 30, 2025

Steering Information Utility in Key-Value Memory for Language Model Post-Training

Chunyuan Deng, Ruidi Chang, Hanjie Chen

PDF

Open Access

TL;DR

This paper introduces InfoSteer, a lightweight post-training method that guides language models to better utilize their stored knowledge, improving performance and interpretability across various models and tasks.

Contribution

The paper proposes InfoSteer, a novel approach that treats FFN layers as key-value memory and encourages their use during post-training, enhancing model performance and interpretability.

Findings

01

Consistent performance improvements across multiple models and tasks.

02

Steered models allocate information more efficiently, focusing on meaningful tokens.

03

Enhanced ability to adapt to in- and out-of-distribution data.

Abstract

Recent advancements in language models (LMs) have marked a shift toward the growing importance of post-training. Yet, post-training approaches such as supervised fine-tuning (SFT) do not guarantee the effective use of knowledge acquired during pretraining. We therefore introduce InfoSteer, a lightweight method that encourages parametric information utilization in LMs during post-training. Specifically, InfoSteer treats the feed-forward network (FFN) layer as associate key-value memory and promotes the use of stored memory vectors via forward-pass interventions or regularization during backpropagation. This simple guidance during post-training phase yields consistent performance improvements across diverse model families -- including Qwen, Gemma and Llama -- spanning 15 downstream tasks in both in-distribution (ID) and out-of-distribution (OOD) evaluations. Beyond performance gains, we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques