WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

Haonan Yu; Junhao Liu; Zhenyu Yan; Haoran Lin; Xin Zhang

arXiv:2603.18474·cs.CL·April 10, 2026

WASD: Locating Critical Neurons as Sufficient Conditions for Explaining and Controlling LLM Behavior

Haonan Yu, Junhao Liu, Zhenyu Yan, Haoran Lin, Xin Zhang

PDF

TL;DR

WASD is a framework that identifies minimal neural conditions to explain and control large language model outputs, improving stability and accuracy over traditional methods.

Contribution

It introduces a novel approach to explain and control LLM behavior by finding sufficient neuron-activation conditions with minimal sets.

Findings

01

WASD produces more stable and accurate explanations than attribution graphs.

02

The method effectively controls cross-lingual output generation.

03

Experiments on SST-2 and CounterFact validate the approach's effectiveness.

Abstract

Precise behavioral control of large language models (LLMs) is critical for complex applications. However, existing methods often incur high training costs, lack natural language controllability, or compromise semantic coherence. To bridge this gap, we propose WASD (unWeaving Actionable Sufficient Directives), a novel framework that explains model behavior by identifying sufficient neural conditions for token generation. Our method represents candidate conditions as neuron-activation predicates and iteratively searches for a minimal set that guarantees the current output under input perturbations. Experiments on SST-2 and CounterFact with the Gemma-2-2B model demonstrate that our approach produces explanations that are more stable, accurate, and concise than conventional attribution graphs. Moreover, through a case study on controlling cross-lingual output generation, we validated the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.