Word Embedding Perturbation for Sentence Classification

Dongxu Zhang; Zhichao Yang

arXiv:1804.08166·cs.CL·April 24, 2018·35 cites

Word Embedding Perturbation for Sentence Classification

Dongxu Zhang, Zhichao Yang

PDF

Open Access 1 Repo

TL;DR

This paper explores various noise-based data augmentation techniques applied to word embeddings to reduce overfitting in sentence classification models, demonstrating improved performance across multiple tasks.

Contribution

It introduces novel noise perturbation methods and constraints for word embeddings, enhancing sentence classification accuracy.

Findings

01

Improved classification accuracy with noise augmentation

02

Gaussian, Bernoulli, and adversarial noise effective

03

Constraints on noise improve robustness

Abstract

In this technique report, we aim to mitigate the overfitting problem of natural language by applying data augmentation methods. Specifically, we attempt several types of noise to perturb the input word embedding, such as Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply several constraints on different types of noise. By implementing these proposed data augmentation methods, the baseline models can gain improvements on several sentence classification tasks.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

zhangdongxu/word-embedding-perturbation
tf

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques