Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

Shushanta Pudasaini; Luis Miralles-Pechu\'an; David Lillis; Marisa Llorens Salvador

arXiv:2603.23146·cs.CL·April 23, 2026

Why AI-Generated Text Detection Fails: Evidence from Explainable AI Beyond Benchmark Accuracy

Shushanta Pudasaini, Luis Miralles-Pechu\'an, David Lillis, Marisa Llorens Salvador

PDF

TL;DR

This paper critically examines the reliability of AI-generated text detectors, revealing their dependence on dataset-specific cues and limited generalization, despite high benchmark scores, and proposes an explainable, linguistics-based detection framework.

Contribution

The authors introduce an interpretable detection framework using linguistic features and explainable AI, highlighting the limitations of current models in cross-domain robustness.

Findings

01

Detectors perform well on in-domain data with F1 score of 0.9734.

02

Significant performance degradation occurs under domain shift and across different generators.

03

Detectors rely on dataset-specific stylistic cues rather than stable indicators of machine authorship.

Abstract

The widespread adoption of Large Language Models (LLMs) has made the detection of AI-Generated text a pressing and complex challenge. Although many detection systems report high benchmark accuracy, their reliability in real-world settings remains uncertain, and their interpretability is often unexplored. In this work, we investigate whether contemporary detectors genuinely identify machine authorship or merely exploit dataset-specific artefacts. We propose an interpretable detection framework that integrates linguistic feature engineering, machine learning, and explainable AI techniques. When evaluated on two prominent benchmark corpora, namely PAN CLEF 2025 and COLING 2025, our model trained on 30 linguistic features achieves leaderboard-competitive performance, attaining an F1 score of 0.9734. However, systematic cross-domain and cross-generator evaluation reveals substantial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.