Selective Text Augmentation with Word Roles for Low-Resource Text   Classification

Biyang Guo; Songqiao Han; Hailiang Huang

arXiv:2209.01560·cs.CL·September 7, 2022·6 cites

Selective Text Augmentation with Word Roles for Low-Resource Text Classification

Biyang Guo, Songqiao Han, Hailiang Huang

PDF

Open Access 1 Repo

TL;DR

This paper introduces STA, a selective text augmentation method that leverages word roles based on their function in classification to generate more effective training samples for low-resource text classification tasks.

Contribution

The work proposes a novel role-based augmentation technique that strategically applies text edits to improve classifier performance in low-resource scenarios.

Findings

01

STA outperforms previous non-selective augmentation methods.

02

Augmented samples improve classifier accuracy significantly.

03

Method enhances cross-dataset generalization.

Abstract

Data augmentation techniques are widely used in text classification tasks to improve the performance of classifiers, especially in low-resource scenarios. Most previous methods conduct text augmentation without considering the different functionalities of the words in the text, which may generate unsatisfactory samples. Different words may play different roles in text classification, which inspires us to strategically select the proper roles for text augmentation. In this work, we first identify the relationships between the words in a text and the text category from the perspectives of statistical correlation and semantic similarity and then utilize them to divide the words into four roles -- Gold, Venture, Bonus, and Trivial words, which have different functionalities for text classification. Based on these word roles, we present a new augmentation technique called STA (Selective Text…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

beyondguo/STA
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsText and Document Classification Technologies · Sentiment Analysis and Opinion Mining · Topic Modeling