Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in   Large Language Models

Aruna Sankaranarayanan; Dylan Hadfield-Menell; Aaron Mueller

arXiv:2501.08618·cs.CL·January 16, 2025

Disjoint Processing Mechanisms of Hierarchical and Linear Grammars in Large Language Models

Aruna Sankaranarayanan, Dylan Hadfield-Menell, Aaron Mueller

PDF

1 Repo

TL;DR

This paper investigates whether large language models develop distinct processing mechanisms for hierarchical and linear grammars, finding evidence of separate components and hierarchy sensitivity even on nonce inputs, independent of meaning.

Contribution

The study demonstrates that LLMs exhibit separate processing mechanisms for hierarchical and linear grammars, with hierarchy-sensitive components active on nonce inputs, indicating an intrinsic structural sensitivity.

Findings

01

Language models differentiate between hierarchical and linear inputs.

02

Distinct components are responsible for processing different grammar types.

03

Hierarchy sensitivity persists even on nonce, non-meaningful inputs.

Abstract

All natural languages are structured hierarchically. In humans, this structural restriction is neurologically coded: when two grammars are presented with identical vocabularies, brain areas responsible for language processing are only sensitive to hierarchical grammars. Using large language models (LLMs), we investigate whether such functionally distinct hierarchical processing regions can arise solely from exposure to large-scale language distributions. We generate inputs using English, Italian, Japanese, or nonce words, varying the underlying grammars to conform to either hierarchical or linear/positional rules. Using these grammars, we first observe that language models show distinct behaviors on hierarchical versus linearly structured inputs. Then, we find that the components responsible for processing hierarchical grammars are distinct from those that process linear grammars; we…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

arunasank/disjoint-processing-llms
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.