Beyond Easy Wins: A Text Hardness-Aware Benchmark for LLM-generated Text Detection
Navid Ayoobi, Sadat Shahriar, Arjun Mukherjee

TL;DR
This paper introduces SHIELD, a new benchmark for AI text detectors that emphasizes real-world reliability and stability, and proposes a hardness-aware framework to challenge and improve detection methods.
Contribution
The paper presents a comprehensive evaluation benchmark and a novel humanification framework that together enhance the assessment of AI text detection systems in practical scenarios.
Findings
SHIELD effectively evaluates detector reliability and stability.
Hardness-aware humanification challenges state-of-the-art detectors.
Benchmark and framework improve practical detection robustness.
Abstract
We present a novel evaluation paradigm for AI text detectors that prioritizes real-world and equitable assessment. Current approaches predominantly report conventional metrics like AUROC, overlooking that even modest false positive rates constitute a critical impediment to practical deployment of detection systems. Furthermore, real-world deployment necessitates predetermined threshold configuration, making detector stability (i.e. the maintenance of consistent performance across diverse domains and adversarial scenarios), a critical factor. These aspects have been largely ignored in previous research and benchmarks. Our benchmark, SHIELD, addresses these limitations by integrating both reliability and stability factors into a unified evaluation metric designed for practical assessment. Furthermore, we develop a post-hoc, model-agnostic humanification framework that modifies AI text to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
