TL;DR
This paper introduces a self-supervised method to explicitly learn syntactic sentence representations, which enhances authorship attribution performance by capturing structural information beyond traditional word embeddings.
Contribution
It proposes a novel self-supervised framework combining lexical and syntactic sub-networks to explicitly encode sentence structure for authorship attribution.
Findings
Structural embeddings improve classification accuracy.
Explicit syntactic representation enhances authorship attribution.
Concatenating structural and word embeddings yields better results.
Abstract
Syntactic structure of sentences in a document substantially informs about its authorial writing style. Sentence representation learning has been widely explored in recent years and it has been shown that it improves the generalization of different downstream tasks across many domains. Even though utilizing probing methods in several studies suggests that these learned contextual representations implicitly encode some amount of syntax, explicit syntactic information further improves the performance of deep neural models in the domain of authorship attribution. These observations have motivated us to investigate the explicit representation learning of syntactic structure of sentences. In this paper, we propose a self-supervised framework for learning structural representations of sentences. The self-supervised network contains two components; a lexical sub-network and a syntactic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
