Safeguarding Crowdsourcing Surveys from ChatGPT with Prompt Injection
Chaofan Wang, Samuel Kernan Freire, Mo Zhang, Jing Wei, Jorge, Goncalves, Vassilis Kostakos, Zhanna Sarsenbayeva, Christina Schneegass,, Alessandro Bozzon, Evangelos Niforatos

TL;DR
This paper introduces a prompt injection-based method to detect LLM-generated responses in crowdsourcing surveys, achieving over 93% detection accuracy to maintain survey integrity.
Contribution
It proposes a novel prompt injection technique for identifying LLM responses and provides an open-source tool for survey designers to ensure data quality.
Findings
Detects LLM responses with over 93% accuracy
Effective across various question types and positions
Provides an open-source implementation for practical use
Abstract
ChatGPT and other large language models (LLMs) have proven useful in crowdsourcing tasks, where they can effectively annotate machine learning training data. However, this means that they also have the potential for misuse, specifically to automatically answer surveys. LLMs can potentially circumvent quality assurance measures, thereby threatening the integrity of methodologies that rely on crowdsourcing surveys. In this paper, we propose a mechanism to detect LLM-generated responses to surveys. The mechanism uses "prompt injection", such as directions that can mislead LLMs into giving predictable responses. We evaluate our technique against a range of question scenarios, types, and positions, and find that it can reliably detect LLM-generated responses with more than 93% effectiveness. We also provide an open-source software to help survey designers use our technique to detect LLM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMobile Crowdsensing and Crowdsourcing · Privacy-Preserving Technologies in Data · Topic Modeling
