TL;DR
This study evaluates BERT's ability to learn novel verbs with limited examples, focusing on syntactic alternations and selectional preferences, revealing robust generalization after minimal exposure.
Contribution
It introduces a novel few-shot learning paradigm for testing BERT's syntactic and semantic generalization with novel verbs.
Findings
BERT generalizes well after one or two examples
BERT shows a transitivity bias in verb behavior
Robust grammatical expectations are formed quickly
Abstract
Previous studies investigating the syntactic abilities of deep learning models have not targeted the relationship between the strength of the grammatical generalization and the amount of evidence to which the model is exposed during training. We address this issue by deploying a novel word-learning paradigm to test BERT's few-shot learning capabilities for two aspects of English verbs: alternations and classes of selectional preferences. For the former, we fine-tune BERT on a single frame in a verbal-alternation pair and ask whether the model expects the novel verb to occur in its sister frame. For the latter, we fine-tune BERT on an incomplete selectional network of verbal objects and ask whether it expects unattested but plausible verb/object pairs. We find that BERT makes robust grammatical generalizations after just one or two instances of a novel word in fine-tuning. For the verbal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsLinear Layer · Softmax · Dense Connections · WordPiece · Linear Warmup With Linear Decay · Attention Dropout · Residual Connection · Adam · Dropout · Weight Decay
