Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Debarati Das; Karin De Langis; Anna Martin-Boyle; Jaehyung Kim; Minhwa; Lee; Zae Myung Kim; Shirley Anugrah Hayati; Risako Owan; Bin Hu; Ritik; Parkar; Ryan Koo; Jonginn Park; Aahan Tyagi; Libby Ferland; Sanjali Roy,; Vincent Liu; and Dongyeop Kang

arXiv:2401.14698·cs.CL·January 31, 2024·2 cites

Under the Surface: Tracking the Artifactuality of LLM-Generated Data

Debarati Das, Karin De Langis, Anna Martin-Boyle, Jaehyung Kim, Minhwa, Lee, Zae Myung Kim, Shirley Anugrah Hayati, Risako Owan, Bin Hu, Ritik, Parkar, Ryan Koo, Jonginn Park, Aahan Tyagi, Libby Ferland, Sanjali Roy,, Vincent Liu, and Dongyeop Kang

PDF

Open Access 1 Datasets

TL;DR

This paper investigates the quality and implications of various types of LLM-generated artificial data, revealing hidden disparities compared to human data, especially in complex tasks, and emphasizing ethical considerations in data creation.

Contribution

First comprehensive analysis aggregating diverse LLM-generated text data and evaluating its quality against human data across multiple benchmarks.

Findings

01

LLM-generated data can match human performance in some tasks

02

Significant disparities exist in complex tasks involving nuanced understanding

03

Highlights ethical concerns and biases in LLM-generated content

Abstract

This work delves into the expanding role of large language models (LLMs) in generating artificial data. LLMs are increasingly employed to create a variety of outputs, including annotations, preferences, instruction prompts, simulated dialogues, and free text. As these forms of LLM-generated data often intersect in their application, they exert mutual influence on each other and raise significant concerns about the quality and diversity of the artificial data incorporated into training cycles, leading to an artificial data ecosystem. To the best of our knowledge, this is the first study to aggregate various types of LLM-generated text data, from more tightly constrained data like "task labels" to more lightly constrained "free-form text". We then stress test the quality and implications of LLM-generated artificial data, comparing it with human data across various existing benchmarks.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Datasets

minnesotanlp/LLM-Artifacts
dataset· 1.2k dl
1.2k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsTopic Modeling · Natural Language Processing Techniques · Explainable Artificial Intelligence (XAI)