Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp
Shuguang Chen, Leonardo Neves, and Thamar Solorio

TL;DR
This paper introduces a simple, efficient method to counteract temporal drift in social media NER models by selecting the most informative training instances based on tweet trendiness, improving accuracy with less data.
Contribution
It proposes a novel metric for measuring tweet trendiness to select training data, reducing the need for extensive retraining and annotation.
Findings
Significant accuracy improvements over baseline methods.
Effective with less training data.
Applicable to multiple state-of-the-art models.
Abstract
Performance of neural models for named entity recognition degrades over time, becoming stale. This degradation is due to temporal drift, the change in our target variables' statistical properties over time. This issue is especially problematic for social media data, where topics change rapidly. In order to mitigate the problem, data annotation and retraining of models is common. Despite its usefulness, this process is expensive and time-consuming, which motivates new research on efficient model updating. In this paper, we propose an intuitive approach to measure the potential trendiness of tweets and use this metric to select the most informative instances to use for training. We conduct experiments on three state-of-the-art models on the Temporal Twitter Dataset. Our approach shows larger increases in prediction accuracy with less training data than the alternatives, making it an…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Spam and Phishing Detection · Data Quality and Management
