Contextual Text Denoising with Masked Language Models

Yifu Sun; Haoming Jiang

arXiv:1910.14080·cs.CL·March 6, 2024·6 cites

Contextual Text Denoising with Masked Language Models

Yifu Sun, Haoming Jiang

PDF

Open Access

TL;DR

This paper introduces a novel, training-free contextual text denoising method using masked language models, enhancing NLP robustness against noisy inputs without requiring additional training.

Contribution

It presents a new algorithm that leverages existing masked language models for noise correction, avoiding retraining and seamlessly integrating into various NLP systems.

Findings

01

Improves downstream task performance on noisy texts

02

Effective under both synthetic and natural noise

03

Does not require retraining of language models

Abstract

Recently, with the help of deep learning models, significant advances have been made in different Natural Language Processing (NLP) tasks. Unfortunately, state-of-the-art models are vulnerable to noisy texts. We propose a new contextual text denoising algorithm based on the ready-to-use masked language model. The proposed algorithm does not require retraining of the model and can be integrated into any NLP system without additional training on paired cleaning training data. We evaluate our method under synthetic noise and natural noise and show that the proposed algorithm can use context information to correct noise text and improve the performance of noisy inputs in several downstream tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis