BERT-based Authorship Attribution on the Romanian Dataset called ROST

Sanda-Maria Avram

arXiv:2301.12500·cs.AI·January 31, 2023

BERT-based Authorship Attribution on the Romanian Dataset called ROST

Sanda-Maria Avram

PDF

Open Access

TL;DR

This paper applies BERT-based models to authorship attribution on a diverse and unbalanced Romanian dataset, achieving high accuracy and demonstrating the effectiveness of pre-trained language models in this task.

Contribution

It introduces a BERT-based approach for Romanian authorship attribution on a challenging, unbalanced dataset, showing promising results.

Findings

01

Achieved up to 87% macro-accuracy.

02

Effectiveness of BERT in handling unbalanced, multilingual datasets.

03

Demonstrated robustness across various text types and sources.

Abstract

Being around for decades, the problem of Authorship Attribution is still very much in focus currently. Some of the more recent instruments used are the pre-trained language models, the most prevalent being BERT. Here we used such a model to detect the authorship of texts written in the Romanian language. The dataset used is highly unbalanced, i.e., significant differences in the number of texts per author, the sources from which the texts were collected, the time period in which the authors lived and wrote these texts, the medium intended to be read (i.e., paper or online), and the type of writing (i.e., stories, short stories, fairy tales, novels, literary articles, and sketches). The results are better than expected, sometimes exceeding 87\% macro-accuracy.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Topic Modeling

MethodsAttention Is All You Need · Linear Layer · Adam · Layer Normalization · Weight Decay · Multi-Head Attention · Residual Connection · Dense Connections · Refunds@Expedia|||How do I get a full refund from Expedia? · Dropout