Evaluating LLM Prompts for Data Augmentation in Multi-label Classification of Ecological Texts
Anna Glazkova, Olga Zakharova

TL;DR
This paper evaluates prompt-based data augmentation techniques using large language models to improve multi-label classification of ecological texts, specifically green practices in Russian social media, showing significant performance gains.
Contribution
It introduces and compares various prompt-based data augmentation strategies for ecological text classification, demonstrating their effectiveness over traditional fine-tuning methods.
Findings
All augmentation strategies improved classification accuracy.
Paraphrasing prompts yielded the best results.
Augmentation outperformed baseline models without data enhancement.
Abstract
Large language models (LLMs) play a crucial role in natural language processing (NLP) tasks, improving the understanding, generation, and manipulation of human language across domains such as translating, summarizing, and classifying text. Previous studies have demonstrated that instruction-based LLMs can be effectively utilized for data augmentation to generate diverse and realistic text samples. This study applied prompt-based data augmentation to detect mentions of green practices in Russian social media. Detecting green practices in social media aids in understanding their prevalence and helps formulate recommendations for scaling eco-friendly actions to mitigate environmental issues. We evaluated several prompts for augmenting texts in a multi-label classification task, either by rewriting existing datasets using LLMs, generating new data, or combining both approaches. Our results…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Text Analysis Techniques · Text and Document Classification Technologies
