BootAug: Boosting Text Augmentation via Hybrid Instance Filtering Framework
Heng Yang, Ke Li

TL;DR
BootAug is a hybrid filtering framework that enhances text augmentation by maintaining feature space similarity, significantly improving classification accuracy across multiple datasets.
Contribution
Proposes BootAug, a transferable hybrid filtering framework that preserves feature space similarity, boosting the effectiveness of existing text augmentation methods on large datasets.
Findings
BootAug improves classification accuracy by approximately 2-3%.
It outperforms state-of-the-art augmentation methods on nine datasets.
Addresses performance drop caused by feature space shift in augmentation.
Abstract
Text augmentation is an effective technique for addressing the problem of insufficient data in natural language processing. However, existing text augmentation methods tend to focus on few-shot scenarios and usually perform poorly on large public datasets. Our research indicates that existing augmentation methods often generate instances with shifted feature spaces, which leads to a drop in performance on the augmented data (for example, EDA generally loses in aspect-based sentiment classification). To address this problem, we propose a hybrid instance-filtering framework (BootAug) based on pre-trained language models that can maintain a similar feature space with natural datasets. BootAug is transferable to existing text augmentation methods (such as synonym substitution and back translation) and significantly improves the augmentation performance by in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Sentiment Analysis and Opinion Mining · Natural Language Processing Techniques
