AI Generated Text Detection
Adilkhan Alikhanov, Aidar Amangeldi, Diar Demeubay, Dilnaz Akhmetzhan, Nurbek Moldakhmetov, Omar Polat, and Galymzhan Zharas

TL;DR
This paper evaluates various AI-generated text detection methods using a new benchmark with diverse datasets, demonstrating that transformer-based models like DistilBERT outperform traditional approaches in accuracy and robustness.
Contribution
It introduces a unified benchmark for AI text detection with a topic-based data split to improve generalization, and compares traditional and transformer-based models on this benchmark.
Findings
Transformer models outperform traditional ML models.
DistilBERT achieves 88.11% accuracy and 0.96 ROC-AUC.
Topic-based data split enhances model robustness.
Abstract
The rapid development of large language models has led to an increase in AI-generated text, with students increasingly using LLM-generated content as their own work, which violates academic integrity. This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures. We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage. This approach ensures robust generalization across unseen domains. Our experiments show that TF-IDF logistic regression achieves a reasonable baseline accuracy of 82.87%. However, deep learning models outperform it. The BiLSTM classifier achieves an accuracy of 88.86%, while DistilBERT achieves a similar accuracy of 88.11% with the highest ROC-AUC score of 0.96, demonstrating the strongest overall…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAcademic integrity and plagiarism · Topic Modeling · Text Readability and Simplification
