NoisyAG-News: A Benchmark for Addressing Instance-Dependent Noise in Text Classification
Hongfei Huang, Tingting Liang, Xixi Sun, Zikang Jin, Yuyu, Yin

TL;DR
This paper introduces NoisyAG-News, a benchmark dataset for real-world, instance-dependent noise in text classification, revealing that pre-trained models are less effective against such noise and highlighting the need for new noise-handling methods.
Contribution
It constructs and analyzes a novel benchmark dataset for real-world noisy labels in text classification, emphasizing the differences from synthetic noise and evaluating model robustness.
Findings
Pre-trained models are resilient to synthetic noise.
Models struggle with real-world, instance-dependent noise.
Real-world noise patterns are more complex and challenging.
Abstract
Existing research on learning with noisy labels predominantly focuses on synthetic label noise. Although synthetic noise possesses well-defined structural properties, it often fails to accurately replicate real-world noise patterns. In recent years, there has been a concerted effort to construct generalizable and controllable instance-dependent noise datasets for image classification, significantly advancing the development of noise-robust learning in this area. However, studies on noisy label learning for text classification remain scarce. To better understand label noise in real-world text classification settings, we constructed the benchmark dataset NoisyAG-News through manual annotation. Initially, we analyzed the annotated data to gather observations about real-world noise. We qualitatively and quantitatively demonstrated that real-world noisy labels adhere to instance-dependent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Anomaly Detection Techniques and Applications
