GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Lotta Kiefer; Christoph Leiter; Sotaro Takeshita; Elena Schmidt; Steffen Eger

arXiv:2601.13711·cs.CL·April 24, 2026

GerAV: Towards New Heights in German Authorship Verification using Fine-Tuned LLMs on a New Benchmark

Lotta Kiefer, Christoph Leiter, Sotaro Takeshita, Elena Schmidt, Steffen Eger

PDF

1 Repo

TL;DR

GerAV introduces a large German authorship verification benchmark with diverse data sources, enabling systematic evaluation of models, and demonstrates that fine-tuned LLMs outperform existing baselines and GPT-5 in this task.

Contribution

This paper presents GerAV, a new comprehensive benchmark for German AV with over 400k labeled pairs, and evaluates models showing fine-tuned LLMs achieve state-of-the-art results.

Findings

01

Fine-tuned LLMs outperform recent baselines by up to 0.09 F1 score.

02

Models trained on specific data types perform best in matching conditions.

03

Combining training sources improves model generalization across data regimes.

Abstract

Authorship verification (AV) is the task of determining whether two texts were written by the same author and has been studied extensively, predominantly for English data. In contrast, large-scale benchmarks and systematic evaluations for other languages remain scarce. We address this gap by introducing GerAV, a comprehensive benchmark for German AV comprising over 400k labeled text pairs. GerAV is built from Twitter and Reddit data, with the Reddit part further divided into in-domain and cross-domain message-based subsets, as well as a profile-based subset. This design enables controlled analysis of the effects of data source, topical domain, and text length. Using the provided training splits, we conduct a systematic evaluation of strong baselines and state-of-the-art models and find that our best approach, a fine-tuned large language model, outperforms recent baselines by up to 0.09…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.