Probing for Bridging Inference in Transformer Language Models

Onkar Pandit; Yufang Hou

arXiv:2104.09400·cs.CL·April 20, 2021

Probing for Bridging Inference in Transformer Language Models

Onkar Pandit, Yufang Hou

PDF

Open Access 1 Repo

TL;DR

This paper investigates how transformer language models, especially BERT, inherently understand bridging inference, revealing that higher layers and specific attention heads focus on bridging relations, and that models can perform bridging anaphora resolution without fine-tuning.

Contribution

It provides the first detailed analysis of bridging inference in transformer models and introduces a masked token prediction approach to evaluate this capability without additional training.

Findings

01

Higher layers focus on bridging relations in BERT.

02

Specific attention heads consistently target bridging.

03

Pre-trained models can resolve bridging anaphora without fine-tuning.

Abstract

We probe pre-trained transformer language models for bridging inference. We first investigate individual attention heads in BERT and observe that attention heads at higher layers prominently focus on bridging relations in-comparison with the lower and middle layers, also, few specific attention heads concentrate consistently on bridging. More importantly, we consider language models as a whole in our second approach where bridging anaphora resolution is formulated as a masked token prediction task (Of-Cloze test). Our formulation produces optimistic results without any fine-tuning, which indicates that pre-trained language models substantially capture bridging inference. Our further investigation shows that the distance between anaphor-antecedent and the context provided to language models play an important role in the inference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

oapandit/probBertForbridging
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications

MethodsMulti-Head Attention · Linear Layer · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout · Adam · Dense Connections · Attention Is All You Need · Softmax · Linear Warmup With Linear Decay · WordPiece