Word Embedding Perturbation for Sentence Classification
Dongxu Zhang, Zhichao Yang

TL;DR
This paper explores various noise-based data augmentation techniques applied to word embeddings to reduce overfitting in sentence classification models, demonstrating improved performance across multiple tasks.
Contribution
It introduces novel noise perturbation methods and constraints for word embeddings, enhancing sentence classification accuracy.
Findings
Improved classification accuracy with noise augmentation
Gaussian, Bernoulli, and adversarial noise effective
Constraints on noise improve robustness
Abstract
In this technique report, we aim to mitigate the overfitting problem of natural language by applying data augmentation methods. Specifically, we attempt several types of noise to perturb the input word embedding, such as Gaussian noise, Bernoulli noise, and adversarial noise, etc. We also apply several constraints on different types of noise. By implementing these proposed data augmentation methods, the baseline models can gain improvements on several sentence classification tasks.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
