TL;DR
This paper addresses the gap in realistic continuous learning benchmarks for Named Entity Recognition by constructing a new dataset, analyzing challenges, and evaluating data rehearsal techniques to improve model accuracy in real-world scenarios.
Contribution
It introduces a new CL NER dataset derived from existing data, discusses challenges of realistic CL, and evaluates data rehearsal as a mitigation strategy.
Findings
Constructed a new CL NER dataset for realistic scenarios
Identified challenges in applying CL to NER tasks
Evaluated effectiveness of data rehearsal in maintaining accuracy
Abstract
There is an increasing interest in continuous learning (CL), as data privacy is becoming a priority for real-world machine learning applications. Meanwhile, there is still a lack of academic NLP benchmarks that are applicable for realistic CL settings, which is a major challenge for the advancement of the field. In this paper we discuss some of the unrealistic data characteristics of public datasets, study the challenges of realistic single-task continuous learning as well as the effectiveness of data rehearsal as a way to mitigate accuracy loss. We construct a CL NER dataset from an existing publicly available dataset and release it along with the code to the research community.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
