BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection

Ali Zain; Sareem Farooqui; Muhammad Rafi

arXiv:2510.20610·cs.CL·October 28, 2025

BUSTED at AraGenEval Shared Task: A Comparative Study of Transformer-Based Models for Arabic AI-Generated Text Detection

Ali Zain, Sareem Farooqui, Muhammad Rafi

PDF

TL;DR

This study compares transformer-based models for detecting Arabic AI-generated text, revealing that multilingual models like XLM-RoBERTa outperform specialized Arabic models in accuracy.

Contribution

It demonstrates the effectiveness of multilingual transformer models over specialized Arabic models in AI-generated text detection tasks.

Findings

01

XLM-RoBERTa achieved the highest F1 score of 0.7701.

02

Multilingual models outperform Arabic-specific models.

03

The work highlights the potential of generalist models in language-specific tasks.

Abstract

This paper details our submission to the AraGenEval Shared Task on Arabic AI-generated text detection, where our team, BUSTED, secured 5th place. We investigated the effectiveness of three pre-trained transformer models: AraELECTRA, CAMeLBERT, and XLM-RoBERTa. Our approach involved fine-tuning each model on the provided dataset for a binary classification task. Our findings revealed a surprising result: the multilingual XLM-RoBERTa model achieved the highest performance with an F1 score of 0.7701, outperforming the specialized Arabic models. This work underscores the complexities of AI-generated text detection and highlights the strong generalization capabilities of multilingual models.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.