AI Generated Text Detection

Adilkhan Alikhanov; Aidar Amangeldi; Diar Demeubay; Dilnaz Akhmetzhan; Nurbek Moldakhmetov; Omar Polat; and Galymzhan Zharas

arXiv:2601.03812·cs.CL·January 8, 2026

AI Generated Text Detection

Adilkhan Alikhanov, Aidar Amangeldi, Diar Demeubay, Dilnaz Akhmetzhan, Nurbek Moldakhmetov, Omar Polat, and Galymzhan Zharas

PDF

Open Access

TL;DR

This paper evaluates various AI-generated text detection methods using a new benchmark with diverse datasets, demonstrating that transformer-based models like DistilBERT outperform traditional approaches in accuracy and robustness.

Contribution

It introduces a unified benchmark for AI text detection with a topic-based data split to improve generalization, and compares traditional and transformer-based models on this benchmark.

Findings

01

Transformer models outperform traditional ML models.

02

DistilBERT achieves 88.11% accuracy and 0.96 ROC-AUC.

03

Topic-based data split enhances model robustness.

Abstract

The rapid development of large language models has led to an increase in AI-generated text, with students increasingly using LLM-generated content as their own work, which violates academic integrity. This paper presents an evaluation of AI text detection methods, including both traditional machine learning models and transformer-based architectures. We utilize two datasets, HC3 and DAIGT v2, to build a unified benchmark and apply a topic-based data split to prevent information leakage. This approach ensures robust generalization across unseen domains. Our experiments show that TF-IDF logistic regression achieves a reasonable baseline accuracy of 82.87%. However, deep learning models outperform it. The BiLSTM classifier achieves an accuracy of 88.86%, while DistilBERT achieves a similar accuracy of 88.11% with the highest ROC-AUC score of 0.96, demonstrating the strongest overall…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAcademic integrity and plagiarism · Topic Modeling · Text Readability and Simplification