Interpretable Predictability-Based AI Text Detection: A Replication Study
Adam Skurla, Dominik Macko, Jakub Simko

TL;DR
This study replicates and extends a system for detecting machine-generated texts, demonstrating that adding stylometric features and using multilingual models improves detection accuracy across languages.
Contribution
It introduces a multilingual detection system with enhanced features and analyzes feature influence, improving upon previous models and emphasizing the importance of clear documentation for reproducibility.
Findings
Stylometric features improve detection performance
Multilingual models perform comparably or better than language-specific models
Clear documentation is crucial for reliable replication
Abstract
This paper replicates and extends the system used in the AuTexTification 2023 shared task for authorship attribution of machine-generated texts. First, we tried to reproduce the original results. Exact replication was not possible because of differences in data splits, model availability, and implementation details. Next, we tested newer multilingual language models and added 26 document-level stylometric features. We also applied SHAP analysis to examine which features influence the model's decisions. We replaced the original GPT-2 models with newer generative models such as Qwen and mGPT for computing probabilistic features. For contextual representations, we used mDeBERTa-v3-base and applied the same configuration to both English and Spanish. This allowed us to use one shared configuration for Subtask 1 and Subtask 2. Our experiments show that the additional stylometric features…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection
