WIKITIDE: A Wikipedia-Based Timestamped Definition Pairs Dataset
Hsuvas Borkakoty, Luis Espinosa-Anke

TL;DR
WikiTiDe is a new timestamped definition pairs dataset from Wikipedia designed to help models detect and adapt to language and knowledge changes over time, facilitating diachronic NLP research.
Contribution
The paper introduces WikiTiDe, an automatically generated dataset of timestamped definitions, and demonstrates its effectiveness in training models for detecting knowledge updates.
Findings
Bootstrapping improves dataset quality and model performance.
Fine-tuned models outperform baselines in change detection tasks.
Dataset supports training models for diachronic NLP applications.
Abstract
A fundamental challenge in the current NLP context, dominated by language models, comes from the inflexibility of current architectures to 'learn' new information. While model-centric solutions like continual learning or parameter-efficient fine tuning are available, the question still remains of how to reliably identify changes in language or in the world. In this paper, we propose WikiTiDe, a dataset derived from pairs of timestamped definitions extracted from Wikipedia. We argue that such resource can be helpful for accelerating diachronic NLP, specifically, for training models able to scan knowledge resources for core updates concerning a concept, an event, or a named entity. Our proposed end-to-end method is fully automatic, and leverages a bootstrapping algorithm for gradually creating a high-quality dataset. Our results suggest that bootstrapping the seed version of WikiTiDe…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Multimodal Machine Learning Applications · Natural Language Processing Techniques
