Who Reasons in the Large Language Models?
Jie Shao, Jianxin Wu

TL;DR
This paper investigates the internal mechanisms of large language models, proposing that the output projection module is primarily responsible for reasoning abilities, supported by diagnostic tools and empirical evidence.
Contribution
It introduces SfN, a diagnostic suite, and provides evidence that the output projection module mainly enables reasoning in LLMs, offering new interpretability insights.
Findings
The output projection module is central to reasoning in LLMs.
Other modules mainly contribute to dialogue fluency.
Diagnostic tools reveal the internal role of modules in LLMs.
Abstract
Despite the impressive performance of large language models (LLMs), the process of endowing them with new capabilities--such as mathematical reasoning--remains largely empirical and opaque. A critical open question is whether reasoning abilities stem from the entire model, specific modules, or are merely artifacts of overfitting. In this work, we hypothesize that the reasoning capabilities in well-trained LLMs are primarily attributed to the output projection module (oproj) in the Transformer's multi-head self-attention (MHSA) mechanism. To support this hypothesis, we introduce Stethoscope for Networks (SfN), a suite of diagnostic tools designed to probe and analyze the internal behaviors of LLMs. Using SfN, we provide both circumstantial and empirical evidence suggesting that oproj plays a central role in enabling reasoning, whereas other modules contribute more to fluent dialogue.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsNatural Language Processing Techniques
