CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models

Xintong Wang; Jingheng Pan; Liang Ding; Longyue Wang; Longqin Jiang; Xingshan Li; Chris Biemann

arXiv:2410.17714·cs.CL·June 3, 2025

CogSteer: Cognition-Inspired Selective Layer Intervention for Efficiently Steering Large Language Models

Xintong Wang, Jingheng Pan, Liang Ding, Longyue Wang, Longqin Jiang, Xingshan Li, Chris Biemann

PDF

Open Access

TL;DR

This paper introduces CogSteer, a cognition-inspired, layer-specific intervention method for steering large language models efficiently and interpretably, improving safety and task performance.

Contribution

It proposes a novel heuristic for selecting optimal intervention layers based on cognitive insights and introduces an implicit contrastive method for safer LLM deployment.

Findings

01

Effective in reducing toxic outputs during inference

02

Improves task performance with fewer parameters

03

Enhances interpretability of LLMs

Abstract

Large Language Models (LLMs) achieve remarkable performance through pretraining on extensive data. This enables efficient adaptation to diverse downstream tasks. However, the lack of interpretability in their underlying mechanisms limits the ability to effectively steer LLMs for specific applications. In this work, we investigate the intrinsic mechanisms of LLMs from a cognitive perspective using eye movement measures. Specifically, we analyze the layer-wise correlation between human cognitive indicators and LLM representations. Building on these insights, we propose a heuristic approach for selecting the optimal steering layer to modulate LLM semantics. To this end, we introduce an efficient selective layer intervention based on prominent parameter-efficient fine-tuning methods, which conventionally adjust either all layers or only the final layer. Additionally, we present an implicit…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques