Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Francis Kulumba; Guillaume Vimont; Laurent Romary; Florian Cafiero

arXiv:2605.19908·cs.CL·May 20, 2026

Where Does Authorship Signal Emerge in Encoder-Based Language Models?

Francis Kulumba, Guillaume Vimont, Laurent Romary, Florian Cafiero

PDF

1 Repo 8 Models 1 Datasets

TL;DR

This paper investigates where authorship signals emerge in encoder-based language models, revealing that the scorer's design influences the layer at which authorship information is consolidated.

Contribution

It demonstrates that the scoring mechanism, not representation quality, determines the emergence of authorship signals in models.

Findings

01

Authorship signal availability is consistent across layers and models.

02

The scorer's causal role influences the layer of signal consolidation.

03

Different scorers exhibit distinct gradient structures and learning trajectories.

Abstract

Authorship attribution models fine-tuned with the same pretrained encoder, data, and loss can differ four-fold in performance depending only on their scoring mechanism. We use mechanistic interpretability tools to explain this gap. Stylistic features such as word length, punctuation density, and function-word frequency are equally available at every layer in every model, including in an off-the-shelf control encoder, hence the gap not coming from representation quality. Instead, causal intervention shows that the scorer determines where the encoder consolidates authorship signal. Mean pooling forces consolidation by early to mid layers, while late interaction defers it to later layers. We further derive this difference from the gradient structure of each scorer, and training dynamics reveal distinct learning trajectories that follow from that difference.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

madjakul/DeepStylometry
github

Models

Datasets

almanach/halvest-contrastive
dataset· 3.6k dl
3.6k dl

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.