Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

Hieu Man; Van-Cuong Pham; Nghia Trung Ngo; Franck Dernoncourt; Thien Huu Nguyen

arXiv:2604.21300·cs.CL·April 24, 2026

Explainable Disentangled Representation Learning for Generalizable Authorship Attribution in the Era of Generative AI

Hieu Man, Van-Cuong Pham, Nghia Trung Ngo, Franck Dernoncourt, Thien Huu Nguyen

PDF

1 Repo

TL;DR

This paper introduces EAVAE, a novel framework that disentangles style from content in text representations, improving authorship attribution and AI-generated text detection with enhanced interpretability and generalization.

Contribution

EAVAE is a new model that explicitly separates style and content representations using architectural design and a discriminator that provides explanations, leading to state-of-the-art results.

Findings

01

Achieves state-of-the-art accuracy on Amazon Reviews, PAN21, and HRS datasets.

02

Excels in few-shot AI-generated text detection on the M4 dataset.

03

Provides interpretable decisions through natural language explanations.

Abstract

Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

hieum98/avae
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.