The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

Maged S. Al-Shaibani; Moataz Ahmed

arXiv:2505.23276·cs.CL·June 5, 2025

The Arabic AI Fingerprint: Stylometric Analysis and Detection of Large Language Models Text

Maged S. Al-Shaibani, Moataz Ahmed

PDF

Open Access 1 Repo 2 Datasets

TL;DR

This paper investigates Arabic machine-generated text, identifying linguistic signatures and developing detection models that achieve high accuracy, addressing challenges in maintaining information integrity across diverse Arabic-language domains.

Contribution

It provides the most comprehensive analysis of Arabic LLM-generated text, combining multiple generation strategies, model architectures, and stylometric analysis to develop effective detection methods.

Findings

01

Detectable linguistic signatures in Arabic LLM outputs

02

Detection models achieve up to 99.9% F1-score in formal contexts

03

Cross-domain generalization challenges are confirmed

Abstract

Large Language Models (LLMs) have achieved unprecedented capabilities in generating human-like text, posing subtle yet significant challenges for information integrity across critical domains, including education, social media, and academia, enabling sophisticated misinformation campaigns, compromising healthcare guidance, and facilitating targeted propaganda. This challenge becomes severe, particularly in under-explored and low-resource languages like Arabic. This paper presents a comprehensive investigation of Arabic machine-generated text, examining multiple generation strategies (generation from the title only, content-aware generation, and text refinement) across diverse model architectures (ALLaM, Jais, Llama, and GPT-4) in academic, and social media domains. Our stylometric analysis reveals distinctive linguistic patterns differentiating human-written from machine-generated…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

kfupm-jrcai/arabic-text-detection
pytorchOfficial

Datasets

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMisinformation and Its Impacts · Topic Modeling · Authorship Attribution and Profiling