Calibrating Pre-trained Language Classifiers on LLM-generated Noisy Labels via Iterative Refinement
Liqin Ye, Agam Shah, Chao Zhang, Sudheer Chava

TL;DR
This paper introduces SiDyP, a method to improve classifier robustness against noisy labels generated by large language models, by iteratively refining label predictions using a simplex diffusion approach, significantly enhancing NLP task performance.
Contribution
The paper presents SiDyP, a novel iterative refinement framework that calibrates classifier predictions to handle LLM-generated noisy labels, improving NLP classifier accuracy.
Findings
Increases BERT classifier performance by over 7% on noisy datasets.
Effectively refines noisy labels using neighborhood distribution and diffusion.
Demonstrates robustness across various LLMs and NLP tasks.
Abstract
The traditional process of creating labeled datasets is labor-intensive and expensive. Recent breakthroughs in open-source large language models (LLMs) have opened up a new avenue in generating labeled datasets automatically for various natural language processing (NLP) tasks, providing an alternative to such an expensive annotation process. However, the reliability of such auto-generated labels remains a significant concern due to inherent inaccuracies. When learning from noisy labels, the model's generalization is likely to be harmed as it is prone to overfit to those label noises. While previous studies in learning from noisy labels mainly focus on synthetic noise and real-world noise, LLM-generated label noise receives less attention. In this paper, we propose SiDyP: Simplex Label Diffusion with Dynamic Prior to calibrate the classifier's prediction, thus enhancing its robustness…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · Attention Is All You Need · Linear Layer · Attention Dropout · Softmax · WordPiece · Weight Decay · Multi-Head Attention · Layer Normalization · Dropout
