Testing of Detection Tools for AI-Generated Text

Debora Weber-Wulff (University of Applied Sciences HTW Berlin,; Germany); Alla Anohina-Naumeca (Riga Technical University; Latvia); Sonja; Bjelobaba (Uppsala University; Sweden); Tom\'a\v{s} Folt\'ynek (Masaryk; University; Czechia); Jean Guerrero-Dib (Universidad de Monterrey; Mexico),; Olumide Popoola (Queen Mary University of London; UK); Petr \v{S}igut; (Masaryk University; Czechia); Lorna Waddington (University of Leeds; UK)

arXiv:2306.15666·cs.CL·December 27, 2023·33 cites

Testing of Detection Tools for AI-Generated Text

Debora Weber-Wulff (University of Applied Sciences HTW Berlin,, Germany), Alla Anohina-Naumeca (Riga Technical University, Latvia), Sonja, Bjelobaba (Uppsala University, Sweden), Tom\'a\v{s} Folt\'ynek (Masaryk, University, Czechia), Jean Guerrero-Dib (Universidad de Monterrey

PDF

Open Access

TL;DR

This study evaluates the effectiveness of 14 detection tools for AI-generated text, revealing their limited accuracy, bias towards classifying as human-written, and vulnerability to obfuscation techniques, raising concerns about their reliability in academic contexts.

Contribution

The paper provides a comprehensive evaluation of detection tools, highlighting their limitations and the impact of obfuscation, and discusses implications for academic integrity.

Findings

01

Detection tools are unreliable and biased towards human classification.

02

Content obfuscation significantly reduces detection accuracy.

03

Current tools cannot reliably distinguish AI-generated from human-written text.

Abstract

Recent advances in generative pre-trained transformer large language models have emphasised the potential risks of unfair use of artificial intelligence (AI) generated content in an academic environment and intensified efforts in searching for solutions to detect such content. The paper examines the general functionality of detection tools for artificial intelligence generated text and evaluates them based on accuracy and error type analysis. Specifically, the study seeks to answer research questions about whether existing detection tools can reliably differentiate between human-written text and ChatGPT-generated text, and whether machine translation and content obfuscation techniques affect the detection of AI-generated text. The research covers 12 publicly available tools and two commercial systems (Turnitin and PlagiarismCheck) that are widely used in the academic setting. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsArtificial Intelligence in Healthcare and Education