Probing BERT's priors with serial reproduction chains

Takateru Yamakoshi; Thomas L. Griffiths; Robert D. Hawkins

arXiv:2202.12226·cs.CL·March 21, 2022

Probing BERT's priors with serial reproduction chains

Takateru Yamakoshi, Thomas L. Griffiths, Robert D. Hawkins

PDF

1 Repo

TL;DR

This paper introduces a novel sampling method using serial reproduction chains to better understand what BERT's language priors encode, demonstrating that GSN-based sampling closely matches true language distributions.

Contribution

It proposes a GSN-based serial reproduction approach for sampling from BERT's priors, providing a more consistent and representative method for probing language models.

Findings

01

GSN chains produce sentences with lexical and syntactic statistics close to the ground-truth corpus.

02

The method outperforms other sampling approaches in naturalness judgments.

03

Establishes a theoretical foundation for bottom-up probing of language models.

Abstract

Sampling is a promising bottom-up method for exposing what generative models have learned about language, but it remains unclear how to generate representative samples from popular masked language models (MLMs) like BERT. The MLM objective yields a dependency network with no guarantee of consistent conditional distributions, posing a problem for naive approaches. Drawing from theories of iterated learning in cognitive science, we explore the use of serial reproduction chains to sample from BERT's priors. In particular, we observe that a unique and consistent estimator of the ground-truth joint distribution is given by a Generative Stochastic Network (GSN) sampler, which randomly selects which token to mask and reconstruct on each step. We show that the lexical and syntactic statistics of sentences from GSN chains closely match the ground-truth corpus distribution and perform better than…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

taka-yamakoshi/telephonegame
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Dense Connections · Softmax · Residual Connection · Weight Decay · Linear Warmup With Linear Decay · WordPiece · Layer Normalization