Classification of Human- and AI-Generated Texts for English, French,   German, and Spanish

Kristina Schaaff; Tim Schlippe; Lorenz Mindner

arXiv:2312.04882·cs.CL·January 31, 2024·2 cites

Classification of Human- and AI-Generated Texts for English, French, German, and Spanish

Kristina Schaaff, Tim Schlippe, Lorenz Mindner

PDF

Open Access

TL;DR

This study develops a multilingual classifier to distinguish human from AI-generated texts across four languages, demonstrating high accuracy and feature portability, and analyzing both original and rephrased AI texts.

Contribution

Introduces a new multilingual corpus and comprehensive feature set for classifying AI-generated texts in four languages, including analysis of rephrased content.

Findings

01

High classification accuracy across languages (95-99% F1-score).

02

Features are portable and effective across different languages.

03

Different feature combinations optimize detection for original and rephrased texts.

Abstract

In this paper we analyze features to classify human- and AI-generated text for English, French, German and Spanish and compare them across languages. We investigate two scenarios: (1) The detection of text generated by AI from scratch, and (2) the detection of text rephrased by AI. For training and testing the classifiers in this multilingual setting, we created a new text corpus covering 10 topics for each language. For the detection of AI-generated text, the combination of all proposed features performs best, indicating that our features are portable to other related languages: The F1-scores are close with 99% for Spanish, 98% for English, 97% for German and 95% for French. For the detection of AI-rephrased text, the systems with all features outperform systems with other features in many cases, but using only document features performs best for German (72%) and Spanish (86%) and only…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNatural Language Processing Techniques · Text Readability and Simplification · Topic Modeling