TL;DR
This paper introduces EAVAE, a novel framework that disentangles style from content in text representations, improving authorship attribution and AI-generated text detection with enhanced interpretability and generalization.
Contribution
EAVAE is a new model that explicitly separates style and content representations using architectural design and a discriminator that provides explanations, leading to state-of-the-art results.
Findings
Achieves state-of-the-art accuracy on Amazon Reviews, PAN21, and HRS datasets.
Excels in few-shot AI-generated text detection on the M4 dataset.
Provides interpretable decisions through natural language explanations.
Abstract
Learning robust representations of authorial style is crucial for authorship attribution and AI-generated text detection. However, existing methods often struggle with content-style entanglement, where models learn spurious correlations between authors' writing styles and topics, leading to poor generalization across domains. To address this challenge, we propose Explainable Authorship Variational Autoencoder (EAVAE), a novel framework that explicitly disentangles style from content through architectural separation-by-design. EAVAE first pretrains style encoders using supervised contrastive learning on diverse authorship data, then finetunes with a Variational Autoencoder (VEA) architecture using separate encoders for style and content representations. Disentanglement is enforced through a novel discriminator that not only distinguishes whether pairs of style/content representations…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
