Dual Debiasing for Noisy In-Context Learning for Text Generation
Siqi Liang, Sumyeong Ahn, Paramveer S. Dhillon, Jiayu Zhou

TL;DR
This paper introduces a dual debiasing framework that improves noise detection in in-context learning for text generation, making it more robust to high noise levels and enhancing overall performance.
Contribution
The paper proposes a novel dual debiasing method that corrects perplexity biases using synthesized neighbors, enabling accurate sample cleanliness assessment under noisy annotations.
Findings
Outperforms existing noise detection methods in high-noise scenarios
Achieves comparable performance to clean demonstration sets in ICL tasks
Remains robust even with extremely high noise ratios
Abstract
In context learning (ICL) relies heavily on high quality demonstrations drawn from large annotated corpora. Existing approaches detect noisy annotations by ranking local perplexities, presuming that noisy samples yield higher perplexities than their clean counterparts. However, this assumption breaks down when the noise ratio is high and many demonstrations are flawed. We reexamine the perplexity based paradigm for text generation under noisy annotations, highlighting two sources of bias in perplexity: the annotation itself and the domain specific knowledge inherent in large language models (LLMs). To overcome these biases, we introduce a dual debiasing framework that uses synthesized neighbors to explicitly correct perplexity estimates, yielding a robust Sample Cleanliness Score. This metric uncovers absolute sample cleanliness regardless of the overall corpus noise level. Extensive…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
