RoIt-XMASA: Multi-Domain Multilingual Sentiment Analysis Dataset for Romanian and Italian
Andrei-Marius Avram, Aureliu Valentin Antonie, Cosmin-Mircea Croitoru, Vlad Andrei Muntean, Dumitru-Clementin Cercel

TL;DR
This paper introduces RoIt-XMASA, a multilingual sentiment analysis dataset for Romanian and Italian, and proposes a multi-target adversarial training framework that improves cross-lingual and cross-domain sentiment classification.
Contribution
The paper presents a new multilingual dataset and a novel adversarial training method that enhances sentiment analysis across multiple languages and domains.
Findings
XLM-R achieves 66.23% F1-score with the proposed method.
Baseline performance is improved by 4.64% over previous methods.
Few-shot Llama-3.1-8B achieves 58.43% F1-score, showing a trade-off between prompting and fine-tuning.
Abstract
We present RoIt-XMASA, a multilingual dataset that extends the Cross-lingual Multi-domain Amazon Sentiment Analysis to Italian and Romanian, comprising 36,000 labeled reviews across three domains (books, movies, and music) and 202,141 unlabeled samples. To address cross-lingual and cross-domain challenges, we propose a multi-target adversarial training framework that employs loss reversal with meta-learned coefficients to dynamically balance sentiment discrimination with domain and language invariance. XLM-R achieves an F1-score of 66.23% with our approach, outperforming the baseline by 4.64%. Few-shot evaluation shows that Llama-3.1-8B achieves 58.43% F1-score, revealing a meaningful trade-off between the efficiency of prompting-based approaches and the higher performance of task-specific fine-tuning.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
