Whodunit? Learning to Contrast for Authorship Attribution

Bo Ai; Yuchen Wang; Yugin Tan; Samson Tan

arXiv:2209.11887·cs.CL·October 11, 2022·1 cites

Whodunit? Learning to Contrast for Authorship Attribution

Bo Ai, Yuchen Wang, Yugin Tan, Samson Tan

PDF

Open Access 1 Repo

TL;DR

This paper introduces Contra-X, a contrastive learning approach that fine-tunes pre-trained language models to create highly separable author-specific text representations, significantly improving authorship attribution accuracy.

Contribution

It is the first to combine contrastive learning with pre-trained language models for authorship attribution, achieving state-of-the-art results across multiple benchmarks.

Findings

01

Achieves up to 6.8% accuracy improvement over traditional fine-tuning.

02

Learns highly separable clusters for different authors.

03

Improves overall accuracy but may reduce performance for some individual authors.

Abstract

Authorship attribution is the task of identifying the author of a given text. The key is finding representations that can differentiate between authors. Existing approaches typically use manually designed features that capture a dataset's content and style, but these approaches are dataset-dependent and yield inconsistent performance across corpora. In this work, we propose \textit{learning} author-specific representations by fine-tuning pre-trained generic language representations with a contrastive objective (Contra-X). We show that Contra-X learns representations that form highly separable clusters for different authors. It advances the state-of-the-art on multiple human and machine authorship attribution benchmarks, enabling improvements of up to 6.8% over cross-entropy fine-tuning. However, we find that Contra-X improves overall accuracy at the cost of sacrificing performance for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

boai01/contra-x
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling

MethodsContrastive Learning