Probing as Quantifying Inductive Bias

Alexander Immer; Lucas Torroba Hennigen; Vincent Fortuin; Ryan; Cotterell

arXiv:2110.08388·cs.CL·March 28, 2022

Probing as Quantifying Inductive Bias

Alexander Immer, Lucas Torroba Hennigen, Vincent Fortuin, Ryan, Cotterell

PDF

Open Access 1 Repo

TL;DR

This paper proposes a Bayesian framework to measure the inductive bias encoded in pre-trained language representations, addressing issues in traditional probing methods and providing new insights into model capabilities.

Contribution

It introduces a novel Bayesian approach to quantify inductive bias in representations, improving upon existing probing techniques and offering empirical evidence on model biases.

Findings

01

The framework reduces issues in traditional probing methods.

02

FastText can sometimes encode better inductive bias than BERT.

03

Provides a new way to understand model capabilities through inductive bias.

Abstract

Pre-trained contextual representations have led to dramatic performance improvements on a range of downstream tasks. Such performance improvements have motivated researchers to quantify and understand the linguistic information encoded in these representations. In general, researchers quantify the amount of linguistic information through probing, an endeavor which consists of training a supervised model to predict a linguistic property directly from the contextual representations. Unfortunately, this definition of probing has been subject to extensive criticism in the literature, and has been observed to lead to paradoxical and counter-intuitive results. In the theoretical portion of this paper, we take the position that the goal of probing ought to be measuring the amount of inductive bias that the representations encode on a specific task. We further describe a Bayesian framework that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rycolab/evidence-probing
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Machine Learning and Algorithms

MethodsMulti-Head Attention · Attention Is All You Need · Linear Layer · Weight Decay · Softmax · Linear Warmup With Linear Decay · Residual Connection · WordPiece · Attention Dropout · Refunds@Expedia|||How do I get a full refund from Expedia?