A comparison of several AI techniques for authorship attribution on   Romanian texts

Sanda Maria Avram; Mihai Oltean

arXiv:2211.05180·cs.AI·January 25, 2023

A comparison of several AI techniques for authorship attribution on Romanian texts

Sanda Maria Avram, Mihai Oltean

PDF

Open Access 1 Repo

TL;DR

This paper compares various AI techniques for authorship attribution on Romanian texts, using a new dataset and focusing on limited speech parts, revealing that some algorithms perform reasonably well despite the task's difficulty.

Contribution

It introduces a new Romanian language dataset and evaluates multiple AI methods for authorship attribution, highlighting their relative effectiveness.

Findings

01

Some algorithms achieve decent error rates on the test set.

02

Authorship attribution remains a challenging problem.

03

Support Vector Machines and Neural Networks perform notably well.

Abstract

Determining the author of a text is a difficult task. Here we compare multiple AI techniques for classifying literary texts written by multiple authors by taking into account a limited number of speech parts (prepositions, adverbs, and conjunctions). We also introduce a new dataset composed of texts written in the Romanian language on which we have run the algorithms. The compared methods are Artificial Neural Networks, Support Vector Machines, Multi Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour. Numerical experiments show, first of all, that the problem is difficult, but some algorithms are able to generate decent errors on the test set.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sanda-avram/rost-source-code
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection

MethodsTest