Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations

Ananth Agarwal; Jasper Jian; Christopher D. Manning; Shikhar Murty

arXiv:2506.16678·cs.CL·November 11, 2025

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations

Ananth Agarwal, Jasper Jian, Christopher D. Manning, Shikhar Murty

PDF

Open Access 1 Video

TL;DR

This study investigates whether probing for syntactic features in transformer models accurately predicts their performance on syntactic tasks, revealing a disconnect between internal representations and observable behaviors.

Contribution

It demonstrates that probing accuracy does not reliably predict syntactic performance, challenging assumptions about internal syntactic representations in large language models.

Findings

01

Probing syntactic features does not predict downstream syntactic performance.

02

A significant disconnect exists between internal representations and observable syntactic behaviors.

03

32 transformer models were evaluated across various syntactic phenomena.

Abstract

Large Language Models (LLMs) exhibit a robust mastery of syntax when processing and generating text. While this suggests internalized understanding of hierarchical syntax and dependency relations, the precise mechanism by which they represent syntactic structure is an open area within interpretability research. Probing provides one way to identify the mechanism of syntax being linearly encoded in activations, however, no comprehensive study has yet established whether a model's probing accuracy reliably predicts its downstream syntactic performance. Adopting a "mechanisms vs. outcomes" framework, we evaluate 32 open-weight transformer models and find that syntactic features extracted via probing fail to predict outcomes of targeted syntax evaluations across English linguistic phenomena. Our results highlight a substantial disconnect between latent syntactic representations found via…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Mechanisms vs. Outcomes: Probing for Syntax Fails to Explain Performance on Targeted Syntactic Evaluations· underline

Taxonomy

TopicsNeurobiology of Language and Bilingualism