Findings of the Counter Turing Test: AI-Generated Text Detection

Rajarshi Roy; Gurpreet Singh; Ashhar Aziz; Shashwat Bajpai; Nasrin Imanpour; Shwetangshu Biswas; Kapil Wanaskar; Parth Patwa; Subhankar Ghosh; Shreyas Dixit; Nilesh Ranjan Pal; Vipula Rawte; Ritvik Garimella; Amitava Das; Amit Sheth; Vasu Sharma; Aishwarya Naresh Reganti; Vinija Jain; and Aman Chadha

arXiv:2605.20761·cs.CL·May 21, 2026

Findings of the Counter Turing Test: AI-Generated Text Detection

Rajarshi Roy, Gurpreet Singh, Ashhar Aziz, Shashwat Bajpai, Nasrin Imanpour, Shwetangshu Biswas, Kapil Wanaskar, Parth Patwa, Subhankar Ghosh, Shreyas Dixit, Nilesh Ranjan Pal, Vipula Rawte, Ritvik Garimella, Amitava Das, Amit Sheth, Vasu Sharma, Aishwarya Naresh Reganti

PDF

TL;DR

This paper analyzes state-of-the-art AI-generated text detection techniques through the Counter Turing Test, highlighting high success in binary classification but challenges in model attribution.

Contribution

It provides a comprehensive evaluation of detection methods, emphasizing the effectiveness in binary classification and the need for improved attribution techniques.

Findings

01

Top system achieved an F1 score of 1.0000 in binary classification.

02

Model attribution scores were significantly lower, with the best at 0.9531.

03

Transformer-based and ensemble methods were most effective.

Abstract

The rapid proliferation of AI-generated text has introduced significant challenges in maintaining the integrity of digital content. Advanced generative models such as GPT-4, Claude 3.5, and Llama can produce highly coherent and human-like text, making it increasingly difficult to differentiate between human-written and AI-generated content. While these models have transformative applications, their misuse has raised concerns about misinformation, biased narratives, and security threats. This paper provides a comprehensive analysis of state-of-the-art AI-generated text detection techniques and evaluates their effectiveness through the Counter Turing Test (CT2) shared tasks. Task A (Binary Classification) required participants to distinguish between human-written and AI-generated text, while Task B (Model Attribution) focused on identifying the specific language model responsible for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.