DetectRL-X: Towards Reliable Multilingual and Real-World LLM-Generated Text Detection
Junchao Wu, Yefeng Liu, Chenyu Zhu, Hao Zhang, Zeyu Wu, Tianqi Shi, Yichao Du, Longyue Wang, Weihua Luo, Jinsong Su, Derek F. Wong

TL;DR
DetectRL-X is a new multilingual benchmark for evaluating the reliability of LLM-generated text detectors across diverse languages, domains, and real-world modifications.
Contribution
It introduces a comprehensive multilingual benchmark with diverse datasets, attack simulations, and evaluation protocols to assess detector robustness in real-world scenarios.
Findings
Current detectors show varied performance across languages and domains.
Domain, generator type, and attack strategies significantly affect detection accuracy.
DetectRL-X effectively reveals strengths and weaknesses of state-of-the-art detectors.
Abstract
The effective detection and governance of Large Language Model (LLM) generated content has become increasingly critical due to the growing risk of misuse. Despite the impressive performance of existing detectors, their reliability and potential in multilingual, real-world scenarios remain largely underexplored. In this study, we introduce DetectRL-X, a comprehensive multilingual benchmark designed to evaluate advanced detectors across 8 dimensions. The benchmark encompasses 8 languages commonly used in commercial contexts and collects human-written texts from 6 domains highly susceptible to LLM misuse. To better aligned with real-world applications, We create LLM-generated texts using 4 popular commercial LLMs, and include typical AI-assisted writing operations such as polishing, expanding, and condensing to capture authentic usage patterns. Furthermore, we develop a multilingual…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
