Understanding writing style in social media with a supervised   contrastively pre-trained transformer

Javier Huertas-Tato; Alejandro Martin; David Camacho

arXiv:2310.11081·cs.CL·October 18, 2023·2 cites

Understanding writing style in social media with a supervised contrastively pre-trained transformer

Javier Huertas-Tato, Alejandro Martin, David Camacho

PDF

Open Access 1 Repo 1 Models

TL;DR

This paper introduces STAR, a supervised contrastively pre-trained transformer model trained on a large social media corpus to improve authorship attribution and understanding of online harmful behaviors.

Contribution

We propose STAR, a novel author representation model trained on 4.5 million texts using supervised contrastive loss, achieving state-of-the-art zero-shot attribution and clustering performance.

Findings

01

Zero-shot attribution and clustering performance on PAN challenges

02

80% accuracy in author identification among 1616 authors

03

Effective authorship verification with a simple dense layer

Abstract

Online Social Networks serve as fertile ground for harmful behavior, ranging from hate speech to the dissemination of disinformation. Malicious actors now have unprecedented freedom to misbehave, leading to severe societal unrest and dire consequences, as exemplified by events such as the Capitol assault during the US presidential election and the Antivaxx movement during the COVID-19 pandemic. Understanding online language has become more pressing than ever. While existing works predominantly focus on content analysis, we aim to shift the focus towards understanding harmful behaviors by relating content to their respective authors. Numerous novel approaches attempt to learn the stylistic features of authors in texts, but many of these approaches are constrained by small datasets or sub-optimal training losses. To overcome these limitations, we introduce the Style Transformer for…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jahuerta92/star
pytorchOfficial

Models

🤗
AIDA-UPM/star
model· 722 dl· ♡ 7
722 dl♡ 7

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Hate Speech and Cyberbullying Detection · Topic Modeling

MethodsMulti-Head Attention · Attention Is All You Need · Dense Connections · Linear Layer · Softmax · Residual Connection · Absolute Position Encodings · Layer Normalization · Adam · Byte Pair Encoding