The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text
Maged S. Al-Shaibani, Moataz Ahmed

TL;DR
This paper investigates Arabic machine-generated text, identifying linguistic signatures and developing detection models that achieve high accuracy, addressing challenges in maintaining information integrity across diverse Arabic-language domains.
Contribution
It provides the most comprehensive analysis of Arabic LLM-generated text, combining multiple generation strategies, model architectures, and stylometric analysis to develop effective detection methods.
Findings
Detectable linguistic signatures in Arabic LLM outputs
Detection models achieve up to 99.9% F1-score in formal contexts
Cross-domain generalization challenges are confirmed
Abstract
Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains, including education, social media, and academia, enabling sophisticated misinformation campaigns, compromising healthcare guidance, and facilitating targeted propaganda. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text, examining multiple generation strategies (generation from the title only, content-aware generation, and text refinement) across diverse model architectures (ALLaM, Jais, Llama, and GPT-4) in academic, and social media domains. Our stylometric analysis reveals distinctive linguistic patterns differentiating human-written from machine-generated…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Topic Modeling · Authorship Attribution and Profiling
