Reddit is all you need: Authorship profiling for Romanian

Ecaterina \c{S}tef\u{a}nescu; Alexandru-Iulius Jerpelea

arXiv:2410.09907·cs.CL·March 20, 2025

Reddit is all you need: Authorship profiling for Romanian

Ecaterina \c{S}tef\u{a}nescu, Alexandru-Iulius Jerpelea

PDF

Open Access

TL;DR

This paper introduces a novel Romanian Reddit-based corpus for authorship profiling, demonstrating how LLMs can infer demographic and personal traits from social media texts, and providing a foundation for future NLP research in this area.

Contribution

It creates the first Romanian social media corpus annotated with author traits and evaluates LLMs for authorship profiling, advancing NLP capabilities in this language.

Findings

01

Successfully built a 23k+ sample Romanian Reddit corpus

02

Demonstrated LLMs can infer demographic traits from social media texts

03

Released resources publicly for further research

Abstract

Authorship profiling is the process of identifying an author's characteristics based on their writings. This centuries old problem has become more intriguing especially with recent developments in Natural Language Processing (NLP). In this paper, we introduce a corpus of short texts in the Romanian language, annotated with certain author characteristic keywords; to our knowledge, the first of its kind. In order to do this, we exploit a social media platform called Reddit. We leverage its thematic community-based structure (subreddits structure), which offers information about the author's background. We infer an user's demographic and some broad personal traits, such as age category, employment status, interests, and social orientation based on the subreddit and other cues. We thus obtain a 23k+ samples corpus, extracted from 100+ Romanian subreddits. We analyse our dataset, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAuthorship Attribution and Profiling