Probing as Quantifying Inductive Bias
Alexander Immer, Lucas Torroba Hennigen, Vincent Fortuin, Ryan, Cotterell

TL;DR
This paper proposes a Bayesian framework to measure the inductive bias encoded in pre-trained language representations, addressing issues in traditional probing methods and providing new insights into model capabilities.
Contribution
It introduces a novel Bayesian approach to quantify inductive bias in representations, improving upon existing probing techniques and offering empirical evidence on model biases.
Findings
The framework reduces issues in traditional probing methods.
FastText can sometimes encode better inductive bias than BERT.
Provides a new way to understand model capabilities through inductive bias.
Abstract
Pre-trained contextual representations have led to dramatic performance improvements on a range of downstream tasks. Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in these representations. In general, researchers quantify the amount of linguistic information through probing, an endeavor which consists of training a supervised model to predict a linguistic property directly from the contextual representations. Unfortunately, this definition of probing has been subject to extensive criticism in the literature, and has been observed to lead to paradoxical and counter-intuitive results. In the theoretical portion of this paper, we take the position that the goal of probing ought to be measuring the amount of inductive bias that the representations encode on a specific task. We further describe a Bayesian framework that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms
MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Softmax · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?
