Neuron-Guided Interpretation of Code LLMs: Where, Why, and How?
Zhe Yin, Xiaodong Gu, Beijun Shen

TL;DR
This paper investigates the internal neuron-level mechanisms of code language models, revealing language-specific and universal neurons, and demonstrates their utility in improving multilingual code tasks.
Contribution
It introduces a neuron-level interpretability framework for code LLMs, identifying language-specific and concept layers, and applies this to enhance multilingual code generation, clone detection, and summarization.
Findings
Neurons are specialized for individual programming languages.
Lower layers encode syntax; middle layers encode shared semantics.
Neuron-guided techniques improve multilingual code tasks.
Abstract
Code language models excel on code intelligence tasks, yet their internal interpretability is underexplored. Existing neuron interpretability techniques from NLP are suboptimal for source code due to programming languages formal, hierarchical, and executable nature. We empirically investigate code LLMs at the neuron level, localizing language-specific neurons (selectively responsive to one language) and concept layers (feed-forward layers encoding language-agnostic code representations). We analyze Llama-3.1-8B and Qwen2.5-Coder-32B on multilingual inputs in C++, Java, Python, Go, and JavaScript, measuring neuron selectivity and layerwise contributions during generation. We find (1) neurons specialized for individual languages alongside a universal subset supporting general-purpose generation; and (2) lower layers mainly encode language-specific syntax, while middle layers capture…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSoftware Engineering Research · Topic Modeling · Explainable Artificial Intelligence (XAI)
