DACTYL: Diverse Adversarial Corpus of Texts Yielded from Large Language Models
Shantanu Thorat, Andrew Caines

TL;DR
This paper introduces DACTYL, a challenging new dataset for AI-generated text detection focusing on one-shot and few-shot generations, revealing current detectors' vulnerabilities and proposing more robust training methods like DXO.
Contribution
The paper presents DACTYL, a novel dataset for evaluating AIG detectors on one-shot/few-shot texts, and compares training approaches, highlighting the robustness of DXO classifiers in out-of-distribution scenarios.
Findings
Existing detectors struggle on DACTYL, especially with one-shot/few-shot texts.
DXO classifiers outperform BCE classifiers on out-of-distribution data.
DXO classifiers show better generalization in real-world deployment scenarios.
Abstract
Existing AIG (AI-generated) text detectors struggle in real-world settings despite succeeding in internal testing, suggesting that they may not be robust enough. We rigorously examine the machine-learning procedure to build these detectors to address this. Most current AIG text detection datasets focus on zero-shot generations, but little work has been done on few-shot or one-shot generations, where LLMs are given human texts as an example. In response, we introduce the Diverse Adversarial Corpus of Texts Yielded from Language models (DACTYL), a challenging AIG text detection dataset focusing on one-shot/few-shot generations. We also include texts from domain-specific continued-pre-trained (CPT) language models, where we fully train all parameters using a memory-efficient optimization approach. Many existing AIG text detectors struggle significantly on our dataset, indicating a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsTopic Modeling · Adversarial Robustness in Machine Learning · Hate Speech and Cyberbullying Detection
