RPN: A Word Vector Level Data Augmentation Algorithm in Deep Learning for Language Understanding
Zhengqing Yuan, Xiaolong Zhang, Yue Wang, Xuecong Hou, Huiwen Xue,, Zhuanzhe Zhao, Yongming Liu

TL;DR
The paper introduces RPN, a novel word vector level data augmentation algorithm that enhances natural language understanding models by introducing noise into word embeddings, outperforming existing methods across various tasks.
Contribution
RPN is a new data augmentation technique that operates directly on word vectors, simplifying application to large datasets and improving performance in NLU tasks.
Findings
RPN outperforms existing augmentation methods in multiple NLU tasks.
RPN achieves state-of-the-art results in seven NLU benchmarks.
RPN is effective in low-resource scenarios.
Abstract
Data augmentation is a widely used technique in machine learning to improve model performance. However, existing data augmentation techniques in natural language understanding (NLU) may not fully capture the complexity of natural language variations, and they can be challenging to apply to large datasets. This paper proposes the Random Position Noise (RPN) algorithm, a novel data augmentation technique that operates at the word vector level. RPN modifies the word embeddings of the original text by introducing noise based on the existing values of selected word vectors, allowing for more fine-grained modifications and better capturing natural language variations. Unlike traditional data augmentation methods, RPN does not require gradients in the computational graph during virtual sample updates, making it simpler to apply to large datasets. Experimental results demonstrate that RPN…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Natural Language Processing Techniques · Speech Recognition and Synthesis
MethodsRegion Proposal Network
