Entity Cloze By Date: What LMs Know About Unseen Entities

Yasumasa Onoe; Michael J.Q. Zhang; Eunsol Choi; Greg Durrett

arXiv:2205.02832·cs.CL·May 6, 2022

Entity Cloze By Date: What LMs Know About Unseen Entities

Yasumasa Onoe, Michael J.Q. Zhang, Eunsol Choi, Greg Durrett

PDF

Open Access

TL;DR

This paper introduces a dataset and framework to evaluate how well language models understand and infer about new, unseen entities based on their descriptions, highlighting the challenges and potential for improving LM knowledge updates.

Contribution

The paper presents a novel dataset of entities with temporal indexing and a method to evaluate LMs' understanding of unseen entities using perplexity on Wikipedia sentences.

Findings

01

Models with access to entity definitions perform better.

02

Inferring about new entities remains a significant challenge for LMs.

03

The dataset enables ongoing evaluation and improvement of LM knowledge about emerging entities.

Abstract

Language models (LMs) are typically trained once on a large-scale corpus and used for years without being updated. However, in a dynamic world, new entities constantly arise. We propose a framework to analyze what LMs can infer about new entities that did not exist when the LMs were pretrained. We derive a dataset of entities indexed by their origination date and paired with their English Wikipedia articles, from which we can find sentences about each entity. We evaluate LMs' perplexity on masked spans within these sentences. We show that models more informed about the entities, such as those with access to a textual definition of them, achieve lower perplexity on this benchmark. Our experimental results demonstrate that making inferences about new entities remains difficult for LMs. Given its wide coverage on entity knowledge and temporal indexing, our dataset can be used to evaluate…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Data Quality and Management