The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks
Anders Giovanni M{\o}ller, Jacob Aarup Dalsgaard, Arianna Pera, Luca, Maria Aiello

TL;DR
This study compares human-labeled and GPT-4/Llama-2 synthetic data in CSS classification tasks, finding human data generally outperforms synthetic data, but augmentation helps with rare classes, and large language models excel in zero-shot settings.
Contribution
It provides guidelines for data annotation in CSS, evaluating synthetic data's effectiveness and comparing LLM-based zero-shot classification to traditional classifiers.
Findings
Human-labeled data outperforms synthetic data in most cases.
Synthetic augmentation improves rare class performance.
LLMs perform well in zero-shot classification but lag behind trained classifiers.
Abstract
In the realm of Computational Social Science (CSS), practitioners often navigate complex, low-resource domains and face the costly and time-intensive challenges of acquiring and annotating data. We aim to establish a set of guidelines to address such challenges, comparing the use of human-labeled data with synthetically generated data from GPT-4 and Llama-2 in ten distinct CSS classification tasks of varying complexity. Additionally, we examine the impact of training data sizes on performance. Our findings reveal that models trained on human-labeled data consistently exhibit superior or comparable performance compared to their synthetically augmented counterparts. Nevertheless, synthetic augmentation proves beneficial, particularly in improving performance on rare classes within multi-class tasks. Furthermore, we leverage GPT-4 and Llama-2 for zero-shot classification and find that,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Data Classification
MethodsAttention Is All You Need · Test · Linear Layer · Adam · Layer Normalization · Dense Connections · Label Smoothing · Dropout · Absolute Position Encodings · Multi-Head Attention
