AIDetx: a compression-based method for identification of   machine-learning generated text

Leonardo Almeida; Pedro Rodrigues; Diogo Magalh\~aes; Armando J.; Pinho; Diogo Pratas

arXiv:2411.19869·cs.CL·December 2, 2024

AIDetx: a compression-based method for identification of machine-learning generated text

Leonardo Almeida, Pedro Rodrigues, Diogo Magalh\~aes, Armando J., Pinho, Diogo Pratas

PDF

Open Access 1 Repo

TL;DR

AIDetx is a compression-based method that effectively detects machine-generated text with high accuracy, interpretability, and lower computational costs compared to traditional deep learning classifiers.

Contribution

The paper introduces a novel compression-based framework using finite-context models for identifying AI-generated text, offering improved efficiency and interpretability.

Findings

01

Achieved F1 scores over 97% and 99% on benchmark datasets.

02

Significantly reduced training time and hardware requirements.

03

Provided a publicly available implementation.

Abstract

This paper introduces AIDetx, a novel method for detecting machine-generated text using data compression techniques. Traditional approaches, such as deep learning classifiers, often suffer from high computational costs and limited interpretability. To address these limitations, we propose a compression-based classification framework that leverages finite-context models (FCMs). AIDetx constructs distinct compression models for human-written and AI-generated text, classifying new inputs based on which model achieves a higher compression ratio. We evaluated AIDetx on two benchmark datasets, achieving F1 scores exceeding 97% and 99%, respectively, highlighting its high accuracy. Compared to current methods, such as large language models (LLMs), AIDetx offers a more interpretable and computationally efficient solution, significantly reducing both training time and hardware requirements…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

aidetx/aidetx
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques