Causal Drawbridges: Characterizing Gradient Blocking of Syntactic Islands in Transformer LMs
Sasha Boguraev, Kyle Mahowald

TL;DR
This paper investigates how Transformer language models process syntactic islands, revealing that they replicate human judgments and that certain mechanisms are selectively blocked, offering insights into linguistic representation.
Contribution
It introduces causal interventions to identify subspaces in Transformers responsible for syntactic processing, proposing a novel hypothesis about conjunction representation in linguistic structures.
Findings
Transformers replicate human judgments on syntactic island extraction.
Causal interventions reveal selective blocking of filler-gap mechanisms.
Conjunction 'and' is represented differently in extractable versus non-extractable constructions.
Abstract
We show how causal interventions in Transformer models provide insights into English syntax by focusing on a long-standing challenge for syntactic theory: syntactic islands. Extraction from coordinated verb phrases is often degraded, yet acceptability varies gradiently with lexical content (e.g., "I know what he hates art and loves" vs. "I know what he looked down and saw"). We show that modern Transformer language models replicate human judgments across this gradient. Using causal interventions that isolate functionally relevant subspaces in Transformer blocks, attention modules, and MLPs, we demonstrate that extraction from coordination islands engages the same filler-gap mechanisms as canonical wh-dependencies, but that these mechanisms are selectively blocked to varying degrees. By projecting a large corpus of unrelated text onto these causally identified subspaces, we derive a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
