Towards Robustness to Label Noise in Text Classification via Noise   Modeling

Siddhant Garg; Goutham Ramakrishnan; Varun Thumbe

arXiv:2101.11214·cs.CL·June 22, 2022

Towards Robustness to Label Noise in Text Classification via Noise Modeling

Siddhant Garg, Goutham Ramakrishnan, Varun Thumbe

PDF

1 Repo

TL;DR

This paper proposes a noise modeling approach for text classification that improves robustness against label noise by estimating and leveraging the likelihood of noisy labels during training.

Contribution

It introduces a beta mixture model to estimate label noise probabilities and uses this to guide training, enhancing robustness to noisy labels in NLP tasks.

Findings

01

Improves accuracy over baseline in noisy label scenarios

02

Prevents overfitting to noisy labels

03

Effective on multiple text classification tasks

Abstract

Large datasets in NLP suffer from noisy labels, due to erroneous automatic and human annotation procedures. We study the problem of text classification with label noise, and aim to capture this noise through an auxiliary noise model over the classifier. We first assign a probability score to each training sample of having a noisy label, through a beta mixture model fitted on the losses at an early epoch of training. Then, we use this score to selectively guide the learning of the noise model and classifier. Our empirical evaluation on two text classification tasks shows that our approach can improve over the baseline accuracy, and prevent over-fitting to the noise.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

thumbe3/label-noise-nlp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.