How Context Affects Language Models' Factual Predictions

Fabio Petroni; Patrick Lewis; Aleksandra Piktus; Tim Rockt\"aschel,; Yuxiang Wu; Alexander H. Miller; Sebastian Riedel

arXiv:2005.04611·cs.CL·May 12, 2020·80 cites

How Context Affects Language Models' Factual Predictions

Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rockt\"aschel,, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

PDF

Open Access

TL;DR

This paper demonstrates that integrating unsupervised retrieval with pre-trained language models significantly enhances their factual prediction capabilities and robustness in zero-shot question answering, especially with noisy contexts.

Contribution

It introduces a method to combine retrieval systems with language models in an unsupervised manner, improving factual accuracy and robustness without supervised training.

Findings

01

Augmentation with retrieval improves model performance.

02

Using segment tokens enhances relevance detection.

03

System is competitive with supervised baselines.

Abstract

When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering. However, storing factual knowledge in a fixed number of weights of a language model clearly has limitations. Previous approaches have successfully provided access to information outside the model weights using supervised architectures that combine an information retrieval system with a machine reading component. In this paper, we go a step further and integrate information from a retrieval system with a pre-trained language model in a purely unsupervised way. We report that augmenting pre-trained language models in this way dramatically improves performance and that the resulting system, despite being unsupervised, is competitive with a supervised machine reading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech and dialogue systems

MethodsLinear Layer · RoBERTa · Residual Connection · Attention Dropout · Linear Warmup With Linear Decay · Weight Decay · Refunds@Expedia|||How do I get a full refund from Expedia? · Dense Connections · Adam · WordPiece