Authorship Attribution in Multilingual Machine-Generated Texts
Lucio La Cava, Dominik Macko, R\'obert M\'oro, Ivan Srba, Andrea Tagarelli

TL;DR
This paper explores the challenge of attributing multilingual texts to either human or specific LLM generators, revealing the limitations of current monolingual methods and emphasizing the need for more robust multilingual attribution techniques.
Contribution
It introduces the novel problem of multilingual authorship attribution for LLMs, evaluates existing monolingual methods across multiple languages, and highlights their limitations in cross-lingual transferability.
Findings
Monolingual AA methods can be adapted to some multilingual contexts.
Significant challenges exist in transferring attribution across diverse languages.
Multilingual AA remains a complex problem requiring more robust solutions.
Abstract
As Large Language Models (LLMs) have reached human-like fluency and coherence, distinguishing machine-generated text (MGT) from human-written content becomes increasingly difficult. While early efforts in MGT detection have focused on binary classification, the growing landscape and diversity of LLMs require a more fine-grained yet challenging authorship attribution (AA), i.e., being able to identify the precise generator (LLM or human) behind a text. However, AA remains nowadays confined to a monolingual setting, with English being the most investigated one, overlooking the multilingual nature and usage of modern LLMs. In this work, we introduce the problem of Multilingual Authorship Attribution, which involves attributing texts to human or multiple LLM generators across diverse languages. Focusing on 18 languages -- covering multiple families and writing scripts -- and 8 generators (7…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling
