NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models

Weiqi Liu; Yongliang Miao; Haiyan Zhao; Yanguang Liu; Mengnan Du

arXiv:2601.03671·cs.CL·January 8, 2026

NeuronScope: A Multi-Agent Framework for Explaining Polysemantic Neurons in Language Models

Weiqi Liu, Yongliang Miao, Haiyan Zhao, Yanguang Liu, Mengnan Du

PDF

Open Access

TL;DR

NeuronScope introduces a multi-agent, iterative framework for interpreting polysemantic neurons in language models, effectively decomposing neuron activations into semantic components and outperforming traditional single-pass methods.

Contribution

It presents a novel multi-agent approach that enhances neuron interpretation by iteratively refining explanations based on activation feedback, addressing polysemanticity in LLMs.

Findings

01

Uncovers hidden polysemantic neurons in language models.

02

Produces explanations with higher activation correlation.

03

Outperforms single-pass interpretation baselines.

Abstract

Neuron-level interpretation in large language models (LLMs) is fundamentally challenged by widespread polysemanticity, where individual neurons respond to multiple distinct semantic concepts. Existing single-pass interpretation methods struggle to faithfully capture such multi-concept behavior. In this work, we propose NeuronScope, a multi-agent framework that reformulates neuron interpretation as an iterative, activation-guided process. NeuronScope explicitly deconstructs neuron activations into atomic semantic components, clusters them into distinct semantic modes, and iteratively refines each explanation using neuron activation feedback. Experiments demonstrate that NeuronScope uncovers hidden polysemanticity and produces explanations with significantly higher activation correlation compared to single-pass baselines.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsExplainable Artificial Intelligence (XAI) · Multimodal Machine Learning Applications · Topic Modeling