Can neural networks acquire a structural bias from raw linguistic data?
Alex Warstadt, Samuel R. Bowman

TL;DR
This paper investigates whether BERT, a neural network model, can develop an inherent structural bias from raw linguistic data, providing evidence that some linguistic universals might be learned without innate biases.
Contribution
It offers empirical evidence that neural networks can acquire structural biases from raw data, supporting the idea that some linguistic universals are learnable without innate predispositions.
Findings
BERT shows a structural bias in subject-auxiliary inversion, reflexive binding, and verb tense detection.
BERT makes a linear generalization in NPI licensing.
Results suggest neural networks can acquire structural biases from raw data.
Abstract
We evaluate whether BERT, a widely used neural network for sentence processing, acquires an inductive bias towards forming structural generalizations through pretraining on raw data. We conduct four experiments testing its preference for structural vs. linear generalizations in different structure-dependent phenomena. We find that BERT makes a structural generalization in 3 out of 4 empirical domains---subject-auxiliary inversion, reflexive binding, and verb tense detection in embedded clauses---but makes a linear generalization when tested on NPI licensing. We argue that these results are the strongest evidence so far from artificial learners supporting the proposition that a structural bias can be acquired from raw data. If this conclusion is correct, it is tentative evidence that some linguistic universals can be acquired by learners without innate biases. However, the precise…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Speech and dialogue systems
MethodsLinear Layer · Attention Dropout · Adam · Dense Connections · Residual Connection · Dropout · Linear Warmup With Linear Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Multi-Head Attention · Layer Normalization
