What Have Been Learned & What Should Be Learned? An Empirical Study of How to Selectively Augment Text for Classification
Biyang Guo, Sonqiao Han, Hailiang Huang

TL;DR
This paper introduces STA, a selective text augmentation method that emphasizes important words for classification, leading to improved performance over traditional non-selective augmentation techniques.
Contribution
The paper systematically identifies role keywords and develops a method to selectively augment text, enhancing classification accuracy especially in low-resource scenarios.
Findings
STA outperforms non-selective augmentation methods on multiple datasets.
Selective augmentation improves classifier performance by emphasizing informative words.
The approach is effective across English and Chinese datasets.
Abstract
Text augmentation techniques are widely used in text classification problems to improve the performance of classifiers, especially in low-resource scenarios. Whilst lots of creative text augmentation methods have been designed, they augment the text in a non-selective manner, which means the less important or noisy words have the same chances to be augmented as the informative words, and thereby limits the performance of augmentation. In this work, we systematically summarize three kinds of role keywords, which have different functions for text classification, and design effective methods to extract them from the text. Based on these extracted role keywords, we propose STA (Selective Text Augmentation) to selectively augment the text, where the informative, class-indicating words are emphasized but the irrelevant or noisy words are diminished. Extensive experiments on four English and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsText and Document Classification Technologies · Sentiment Analysis and Opinion Mining
