MASIVE: Open-Ended Affective State Identification in English and Spanish
Nicholas Deas, Elsbeth Turcan, Iv\'an P\'erez Mej\'ia and, Kathleen McKeown

TL;DR
This paper introduces MASIVE, a multilingual dataset of Reddit posts capturing a broad range of affective states in English and Spanish, and proposes a new affective state identification task for language models, demonstrating the importance of native data and model fine-tuning.
Contribution
The work presents MASIVE, a large multilingual dataset of affective states, and defines a novel affective state identification task, showing smaller models outperform larger ones and highlighting native data importance.
Findings
Smaller finetuned models outperform larger LLMs in affective state identification.
Pretraining on MASIVE improves performance on emotion benchmarks.
Native speaker data is crucial for effective affective state identification.
Abstract
In the field of emotion analysis, much NLP research focuses on identifying a limited number of discrete emotion categories, often applied across languages. These basic sets, however, are rarely designed with textual data in mind, and culture, language, and dialect can influence how particular emotions are interpreted. In this work, we broaden our scope to a practically unbounded set of \textit{affective states}, which includes any terms that humans use to describe their experiences of feeling. We collect and publish MASIVE, a dataset of Reddit posts in English and Spanish containing over 1,000 unique affective states each. We then define the new problem of \textit{affective state identification} for language generation models framed as a masked span prediction task. On this task, we find that smaller finetuned multilingual models outperform much larger LLMs, even on region-specific…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsDeception detection and forensic psychology
MethodsSparse Evolutionary Training
