Enhancing Representation Generalization in Authorship Identification
Haining Wang

TL;DR
This paper reviews methods for improving the generalization of stylistic features in authorship identification, emphasizing the importance of feature selection and deep learning approaches for cross-domain accuracy.
Contribution
It systematically analyzes stylistic features and strategies to enhance cross-domain generalization in authorship identification, highlighting the effectiveness of deep learning models.
Findings
Character n-grams and function words are robust features.
Content words can bias and reduce cross-domain accuracy.
Deep learning models with syntactic info improve generalization.
Abstract
Authorship identification ascertains the authorship of texts whose origins remain undisclosed. That authorship identification techniques work as reliably as they do has been attributed to the fact that authorial style is properly captured and represented. Although modern authorship identification methods have evolved significantly over the years and have proven effective in distinguishing authorial styles, the generalization of stylistic features across domains has not been systematically reviewed. The presented work addresses the challenge of enhancing the generalization of stylistic representations in authorship identification, particularly when there are discrepancies between training and testing samples. A comprehensive review of empirical studies was conducted, focusing on various stylistic features and their effectiveness in representing an author's style. The influencing factors…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Topic Modeling · Hate Speech and Cyberbullying Detection
