Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift   with Multiple Views

Katerina Margatina; Shuai Wang; Yogarshi Vyas; Neha Anna John; Yassine; Benajiba; Miguel Ballesteros

arXiv:2302.12297·cs.CL·February 27, 2023·1 cites

Dynamic Benchmarking of Masked Language Models on Temporal Concept Drift with Multiple Views

Katerina Margatina, Shuai Wang, Yogarshi Vyas, Neha Anna John, Yassine, Benajiba, Miguel Ballesteros

PDF

Open Access

TL;DR

This paper introduces a comprehensive framework for evaluating how well masked language models stay current with evolving factual knowledge over time, using dynamic, multi-granularity test sets derived from Wikidata.

Contribution

It presents a novel holistic framework that dynamically creates temporal test sets, constructs detailed splits, and evaluates MLMs from multiple perspectives to assess their robustness over time.

Findings

01

Framework enables evaluation at various time granularities.

02

Multi-view evaluation reveals models' robustness to factual updates.

03

Benchmarking 11 pretrained MLMs on temporal data.

Abstract

Temporal concept drift refers to the problem of data changing over time. In NLP, that would entail that language (e.g. new expressions, meaning shifts) and factual knowledge (e.g. new concepts, updated facts) evolve over time. Focusing on the latter, we benchmark $11$ pretrained masked language models (MLMs) on a series of tests designed to evaluate the effect of temporal concept drift, as it is crucial that widely used language models remain up-to-date with the ever-evolving factual updates of the real world. Specifically, we provide a holistic framework that (1) dynamically creates temporal test sets of any time granularity (e.g. month, quarter, year) of factual data from Wikidata, (2) constructs fine-grained splits of tests (e.g. updated, new, unchanged facts) to ensure comprehensive analysis, and (3) evaluates MLMs in three distinct ways (single-token probing, multi-token…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsData Stream Mining Techniques · Recommender Systems and Techniques · Caching and Content Delivery

MethodsTest