LF-Steering: Latent Feature Activation Steering for Enhancing Semantic   Consistency in Large Language Models

Jingyuan Yang; Rongjun Li; Weixuan Wang; Ziyu Zhou; Zhiyong Feng; Wei; Peng

arXiv:2501.11036·cs.CL·January 23, 2025

LF-Steering: Latent Feature Activation Steering for Enhancing Semantic Consistency in Large Language Models

Jingyuan Yang, Rongjun Li, Weixuan Wang, Ziyu Zhou, Zhiyong Feng, Wei, Peng

PDF

Open Access

TL;DR

LF-Steering introduces a feature-level activation steering method using sparse autoencoders to improve semantic consistency in large language models, addressing the polysemanticity issue and enabling precise control over model outputs.

Contribution

The paper proposes LF-Steering, a novel approach that maps transformer hidden states into a sparse feature space for better semantic control, advancing beyond component-level steering methods.

Findings

01

Significant improvement in semantic consistency across NLU and NLG tasks.

02

Effective decoupling of features reduces interference during steering.

03

Enhanced performance demonstrated on multiple datasets.

Abstract

Large Language Models (LLMs) often generate inconsistent responses when prompted with semantically equivalent paraphrased inputs. Recently, activation steering, a technique that modulates LLMs' behaviours by adjusting their latent representations during inference time, has been explored to improve the semantic consistency of LLMs. However, these methods typically operate at the model component level, such as layer hidden states or attention head outputs. They face a challenge due to the ``polysemanticity issue'', where the model components of LLMs typically encode multiple entangled features, making precise steering difficult. To address this challenge, we drill down to feature-level representations and propose LF-Steering, a novel activation steering approach to precisely identify latent feature representations responsible for semantic inconsistency. More specifically, our method maps…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques

MethodsSoftmax · Attention Is All You Need · Sparse Autoencoder