A comparison of several AI techniques for authorship attribution on Romanian texts
Sanda Maria Avram, Mihai Oltean

TL;DR
This paper compares various AI techniques for authorship attribution on Romanian texts, using a new dataset and focusing on limited speech parts, revealing that some algorithms perform reasonably well despite the task's difficulty.
Contribution
It introduces a new Romanian language dataset and evaluates multiple AI methods for authorship attribution, highlighting their relative effectiveness.
Findings
Some algorithms achieve decent error rates on the test set.
Authorship attribution remains a challenging problem.
Support Vector Machines and Neural Networks perform notably well.
Abstract
Determining the author of a text is a difficult task. Here we compare multiple AI techniques for classifying literary texts written by multiple authors by taking into account a limited number of speech parts (prepositions, adverbs, and conjunctions). We also introduce a new dataset composed of texts written in the Romanian language on which we have run the algorithms. The compared methods are Artificial Neural Networks, Support Vector Machines, Multi Expression Programming, Decision Trees with C5.0, and k-Nearest Neighbour. Numerical experiments show, first of all, that the problem is difficult, but some algorithms are able to generate decent errors on the test set.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAuthorship Attribution and Profiling · Natural Language Processing Techniques · Hate Speech and Cyberbullying Detection
MethodsTest
