Evaluating LLM-Contaminated Crowdsourcing Data Without Ground Truth

Yichi Zhang; Jinlong Pang; Zhaowei Zhu; Yang Liu

arXiv:2506.06991·cs.AI·November 7, 2025

Evaluating LLM-Contaminated Crowdsourcing Data Without Ground Truth

Yichi Zhang, Jinlong Pang, Zhaowei Zhu, Yang Liu

PDF

Open Access 1 Video

TL;DR

This paper proposes a training-free peer prediction method to detect LLM-assisted cheating in crowdsourcing annotation tasks, ensuring data quality without relying on ground truth or high-dimensional training data.

Contribution

It introduces a novel, theoretically guaranteed scoring mechanism that mitigates LLM collusion in crowdsourcing without requiring training data.

Findings

01

Effective detection of low-effort cheating demonstrated on real datasets

02

Method is robust against LLM collusion and does not need ground truth

03

Theoretical guarantees provided for the proposed scoring mechanism

Abstract

The recent success of generative AI highlights the crucial role of high-quality human feedback in building trustworthy AI systems. However, the increasing use of large language models (LLMs) by crowdsourcing workers poses a significant challenge: datasets intended to reflect human input may be compromised by LLM-generated responses. Existing LLM detection approaches often rely on high-dimensional training data such as text, making them unsuitable for annotation tasks like multiple-choice labeling. In this work, we investigate the potential of peer prediction -- a mechanism that evaluates the information within workers' responses without using ground truth -- to mitigate LLM-assisted cheating in crowdsourcing with a focus on annotation tasks. Our approach quantifies the correlations between worker answers while conditioning on (a subset of) LLM-generated labels available to the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

Evaluating LLM-contaminated Crowdsourcing Data Without Ground Truth· slideslive

Taxonomy

TopicsMobile Crowdsensing and Crowdsourcing · Ethics and Social Impacts of AI · Open Source Software Innovations

MethodsFocus