Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp

Shuguang Chen; Leonardo Neves; and Thamar Solorio

arXiv:2104.09742·cs.CL·April 21, 2021

Mitigating Temporal-Drift: A Simple Approach to Keep NER Models Crisp

Shuguang Chen, Leonardo Neves, and Thamar Solorio

PDF

Open Access 1 Repo

TL;DR

This paper introduces a simple, efficient method to counteract temporal drift in social media NER models by selecting the most informative training instances based on tweet trendiness, improving accuracy with less data.

Contribution

It proposes a novel metric for measuring tweet trendiness to select training data, reducing the need for extensive retraining and annotation.

Findings

01

Significant accuracy improvements over baseline methods.

02

Effective with less training data.

03

Applicable to multiple state-of-the-art models.

Abstract

Performance of neural models for named entity recognition degrades over time, becoming stale. This degradation is due to temporal drift, the change in our target variables' statistical properties over time. This issue is especially problematic for social media data, where topics change rapidly. In order to mitigate the problem, data annotation and retraining of models is common. Despite its usefulness, this process is expensive and time-consuming, which motivates new research on efficient model updating. In this paper, we propose an intuitive approach to measure the potential trendiness of tweets and use this metric to select the most informative instances to use for training. We conduct experiments on three state-of-the-art models on the Temporal Twitter Dataset. Our approach shows larger increases in prediction accuracy with less training data than the alternatives, making it an…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

RiTUAL-UH/trending_NER
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Spam and Phishing Detection · Data Quality and Management