Contextual Text Denoising with Masked Language Models
Yifu Sun, Haoming Jiang

TL;DR
This paper introduces a novel, training-free contextual text denoising method using masked language models, enhancing NLP robustness against noisy inputs without requiring additional training.
Contribution
It presents a new algorithm that leverages existing masked language models for noise correction, avoiding retraining and seamlessly integrating into various NLP systems.
Findings
Improves downstream task performance on noisy texts
Effective under both synthetic and natural noise
Does not require retraining of language models
Abstract
Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks. Unfortunately, state-of-the-art models are vulnerable to noisy texts. We propose a new contextual text denoising algorithm based on the ready-to-use masked language model. The proposed algorithm does not require retraining of the model and can be integrated into any NLP system without additional training on paired cleaning training data. We evaluate our method under synthetic noise and natural noise and show that the proposed algorithm can use context information to correct noise text and improve the performance of noisy inputs in several downstream tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
