Data Noising as Smoothing in Neural Network Language Models

Ziang Xie; Sida I. Wang; Jiwei Li; Daniel L\'evy; Aiming Nie; Dan; Jurafsky; Andrew Y. Ng

arXiv:1703.02573·cs.LG·March 9, 2017·173 cites

Data Noising as Smoothing in Neural Network Language Models

Ziang Xie, Sida I. Wang, Jiwei Li, Daniel L\'evy, Aiming Nie, Dan, Jurafsky, Andrew Y. Ng

PDF

Open Access 1 Repo

TL;DR

This paper establishes a connection between data noising in neural network language models and smoothing in n-gram models, leading to improved language modeling and machine translation performance.

Contribution

It introduces noising schemes inspired by smoothing techniques, bridging discrete sequence regularization with established n-gram smoothing methods.

Findings

01

Performance gains in language modeling

02

Enhanced machine translation results

03

Empirical validation of noising-smoothing relationship

Abstract

Data noising is an effective technique for regularizing neural network models. While noising is widely adopted in application domains such as vision and speech, commonly used noising primitives have not been developed for discrete sequence-level settings such as language modeling. In this paper, we derive a connection between input noising in neural network language models and smoothing in $n$ -gram models. Using this connection, we draw upon ideas from smoothing to develop effective noising schemes. We demonstrate performance gains when applying the proposed schemes to language modeling and machine translation. Finally, we provide empirical analysis validating the relationship between noising and smoothing.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

stanfordmlgroup/nlm-noising
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Multimodal Machine Learning Applications