Text Data Augmentation Made Simple By Leveraging NLP Cloud APIs
Claude Coulombe

TL;DR
This paper explores practical, scalable text data augmentation methods using NLP cloud APIs, demonstrating significant accuracy improvements across various neural network architectures for text classification tasks.
Contribution
It introduces and evaluates simple, robust text augmentation techniques leveraging NLP cloud APIs, providing scalable solutions for improving text classification accuracy.
Findings
Augmentation increased accuracy by 4.3% to 21.6%.
Techniques are scalable and easy to implement.
Improvements observed across multiple neural network architectures.
Abstract
In practice, it is common to find oneself with far too little text data to train a deep neural network. This "Big Data Wall" represents a challenge for minority language communities on the Internet, organizations, laboratories and companies that compete the GAFAM (Google, Amazon, Facebook, Apple, Microsoft). While most of the research effort in text data augmentation aims on the long-term goal of finding end-to-end learning solutions, which is equivalent to "using neural networks to feed neural networks", this engineering work focuses on the use of practical, robust, scalable and easy-to-implement data augmentation pre-processing techniques similar to those that are successful in computer vision. Several text augmentation techniques have been experimented. Some existing ones have been tested for comparison purposes such as noise injection or the use of regular expressions. Others are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNatural Language Processing Techniques · Topic Modeling · Handwritten Text Recognition Techniques
MethodsSigmoid Activation · Tanh Activation · Long Short-Term Memory
