Does BERT really agree ? Fine-grained Analysis of Lexical Dependence on a Syntactic Task
Karim Lasri, Alessandro Lenci, Thierry Poibeau

TL;DR
This paper investigates BERT's ability to perform syntactic agreement tasks independently of lexical cues, revealing that while it generalizes well on simple templates, it struggles with lexical independence when distractors are introduced.
Contribution
The study provides a fine-grained analysis of BERT's syntactic generalization, especially its limitations in lexically-independent agreement tasks with attractors.
Findings
BERT performs well on simple templates without lexical cues.
BERT fails to generalize lexically independently when attractors are present.
Performance drops significantly with distractors in nonce sentence tests.
Abstract
Although transformer-based Neural Language Models demonstrate impressive performance on a variety of tasks, their generalization abilities are not well understood. They have been shown to perform strongly on subject-verb number agreement in a wide array of settings, suggesting that they learned to track syntactic dependencies during their training even without explicit supervision. In this paper, we examine the extent to which BERT is able to perform lexically-independent subject-verb number agreement (NA) on targeted syntactic templates. To do so, we disrupt the lexical patterns found in naturally occurring stimuli for each targeted structure in a novel fine-grained analysis of BERT's behavior. Our results on nonce sentences suggest that the model generalizes well for simple templates, but fails to perform lexically-independent syntactic generalization when as little as one attractor…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsAttention Is All You Need · Linear Layer · Weight Decay · Dropout · Adam · Dense Connections · Attention Dropout · Multi-Head Attention · Layer Normalization · Refunds@Expedia|||How do I get a full refund from Expedia?
