Probing Syntax in Large Language Models: Successes and Remaining Challenges

Pablo J. Diego-Sim\'on; Emmanuel Chemla; Jean-R\'emi King; Yair Lakretz

arXiv:2508.03211·cs.CL·August 12, 2025

Probing Syntax in Large Language Models: Successes and Remaining Challenges

Pablo J. Diego-Sim\'on, Emmanuel Chemla, Jean-R\'emi King, Yair Lakretz

PDF

Open Access

TL;DR

This paper critically examines the effectiveness of structural probes in extracting syntactic information from large language models, revealing biases, limitations, and challenges in representing complex syntactic structures.

Contribution

It introduces a controlled benchmark to systematically evaluate structural probes, highlighting their biases and limitations in capturing deep syntactic structures.

Findings

01

Probes are biased by word proximity in sentences.

02

Probes struggle with deep syntactic structures and interacting nouns.

03

Probes are unaffected by word predictability.

Abstract

The syntactic structures of sentences can be readily read-out from the activations of large language models (LLMs). However, the ``structural probes'' that have been developed to reveal this phenomenon are typically evaluated on an indiscriminate set of sentences. Consequently, it remains unclear whether structural and/or statistical factors systematically affect these syntactic representations. To address this issue, we conduct an in-depth analysis of structural probes on three controlled benchmarks. Our results are three-fold. First, structural probes are biased by a superficial property: the closer two words are in a sentence, the more likely structural probes will consider them as syntactically linked. Second, structural probes are challenged by linguistic properties: they poorly represent deep syntactic structures, and get interfered by interacting nouns or ungrammatical verb…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Text Readability and Simplification