Controllable Data Augmentation for Few-Shot Text Mining with   Chain-of-Thought Attribute Manipulation

Letian Peng; Yuwei Zhang; Jingbo Shang

arXiv:2307.07099·cs.CL·May 24, 2024·2 cites

Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation

Letian Peng, Yuwei Zhang, Jingbo Shang

PDF

Open Access 1 Repo

TL;DR

This paper introduces CoTAM, a novel method for controllable data augmentation in few-shot NLP tasks, using chain-of-thought prompting to directly manipulate task-specific attributes in text, improving model performance.

Contribution

We propose Chain-of-Thought Attribute Manipulation (CoTAM), a new approach that directly edits text attributes via chain-of-thought prompting for effective data augmentation.

Findings

01

CoTAM outperforms other LLM-based augmentation methods across multiple NLP tasks.

02

Augmented datasets reveal human-recognizable decision boundaries.

03

The method enhances both fine-tuning and in-context learning performance.

Abstract

Prompting large language models (LLMs) for data augmentation has recently become a common practice in few-shot NLP tasks. In this paper, we propose Chain-of-Thought Attribute Manipulation (CoTAM), a novel approach that generates new data from existing examples by only tweaking in the user-provided, task-specific attribute, e.g., sentiment polarity or topic in movie reviews. Instead of conventional latent representation controlling, we leverage the chain-of-thought prompting to directly edit the text in three steps, (1) attribute decomposition, (2) manipulation proposal, and (3) sentence reconstruction. Extensive results on various tasks, such as text (pair) classification, aspect-based sentiment analysis, and conditional text generation, verify the superiority of CoTAM over other LLM-based augmentation methods with the same number of training examples for both fine-tuning and in-context…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

komeijiforce/cotam
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Topic Modeling · Text Readability and Simplification