Authorship Attribution in Multilingual Machine-Generated Texts

Lucio La Cava; Dominik Macko; R\'obert M\'oro; Ivan Srba; Andrea Tagarelli

arXiv:2508.01656·cs.CL·August 5, 2025

Authorship Attribution in Multilingual Machine-Generated Texts

Lucio La Cava, Dominik Macko, R\'obert M\'oro, Ivan Srba, Andrea Tagarelli

PDF

Open Access

TL;DR

This paper explores the challenge of attributing multilingual texts to either human or specific LLM generators, revealing the limitations of current monolingual methods and emphasizing the need for more robust multilingual attribution techniques.

Contribution

It introduces the novel problem of multilingual authorship attribution for LLMs, evaluates existing monolingual methods across multiple languages, and highlights their limitations in cross-lingual transferability.

Findings

01

Monolingual AA methods can be adapted to some multilingual contexts.

02

Significant challenges exist in transferring attribution across diverse languages.

03

Multilingual AA remains a complex problem requiring more robust solutions.

Abstract

As Large Language Models (LLMs) have reached human-like fluency and coherence, distinguishing machine-generated text (MGT) from human-written content becomes increasingly difficult. While early efforts in MGT detection have focused on binary classification, the growing landscape and diversity of LLMs require a more fine-grained yet challenging authorship attribution (AA), i.e., being able to identify the precise generator (LLM or human) behind a text. However, AA remains nowadays confined to a monolingual setting, with English being the most investigated one, overlooking the multilingual nature and usage of modern LLMs. In this work, we introduce the problem of Multilingual Authorship Attribution, which involves attributing texts to human or multiple LLM generators across diverse languages. Focusing on 18 languages -- covering multiple families and writing scripts -- and 8 generators (7…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling