An Effective Deployment of Diffusion LM for Data Augmentation in Low-Resource Sentiment Classification
Zhuowei Chen, Lianxi Wang, Yuben Wu, Xinfeng Liao, Yujia Tian, Junyang, Zhong

TL;DR
This paper introduces DiffusionCLS, a novel data augmentation method using diffusion language models to improve low-resource sentiment classification by reconstructing key emotional tokens, balancing diversity and consistency.
Contribution
The paper proposes a diffusion LM-based data augmentation approach specifically designed for sentiment classification, emphasizing reconstruction of strong emotional tokens to enhance performance.
Findings
Effective in low-resource, domain-specific, and few-shot scenarios
Outperforms baseline data augmentation methods
Ablation studies validate each module's contribution
Abstract
Sentiment classification (SC) often suffers from low-resource challenges such as domain-specific contexts, imbalanced label distributions, and few-shot scenarios. The potential of the diffusion language model (LM) for textual data augmentation (DA) remains unexplored, moreover, textual DA methods struggle to balance the diversity and consistency of new samples. Most DA methods either perform logical modifications or rephrase less important tokens in the original sequence with the language model. In the context of SC, strong emotional tokens could act critically on the sentiment of the whole sequence. Therefore, contrary to rephrasing less important context, we propose DiffusionCLS to leverage a diffusion LM to capture in-domain knowledge and generate pseudo samples by reconstructing strong label-related tokens. This approach ensures a balance between consistency and diversity, avoiding…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Traffic Prediction and Management Techniques · Machine Learning and ELM
MethodsDiffusion
