Detecting the Use of Generative AI in Crowdsourced Surveys: Implications for Data Integrity
Dapeng Zhang, Marina Katoh, Weiping Pei

TL;DR
This paper investigates the rise of AI-generated responses in crowdsourced surveys post-2022, evaluates detection methods, and discusses implications for research integrity across various fields.
Contribution
It introduces and compares two detection approaches for AI responses in surveys and provides empirical evidence of GenAI's impact on data quality.
Findings
Significant increase in AI-generated responses after 2022
Detection methods can identify AI responses with varying accuracy
GenAI use threatens the validity of survey-based research
Abstract
The widespread adoption of generative AI (GenAI) has introduced new challenges in crowdsourced data collection, particularly in survey-based research. While GenAI offers powerful capabilities, its unintended use in crowdsourcing, such as generating automated survey responses, threatens the integrity of empirical research and complicates efforts to understand public opinion and behavior. In this study, we investigate and evaluate two approaches for detecting AI-generated responses in online surveys: LLM-based detection and signature-based detection. We conducted experiments across seven survey studies, comparing responses collected before 2022 with those collected after the release of ChatGPT. Our findings reveal a significant increase in AI-generated responses in the post-2022 studies, highlighting how GenAI may silently distort crowdsourced data. This work raises broader concerns about…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
