Breaking Bad: Norms for Valence, Arousal, and Dominance for over 10k English Multiword Expressions
Saif M. Mohammad

TL;DR
This paper introduces an expanded human-rated lexicon for over 10,000 multiword expressions and 25,000 words, capturing valence, arousal, and dominance, to facilitate research across NLP, psychology, and social sciences.
Contribution
It provides a new, reliable lexicon with ratings for MWEs and words, increasing coverage and enabling analysis of emotionality and compositionality in language.
Findings
High reliability of associations in the lexicon
MWEs exhibit varying degrees of emotionality
The lexicon supports diverse research applications
Abstract
Factor analysis studies have shown that the primary dimensions of word meaning are Valence (V), Arousal (A), and Dominance (D). Existing lexicons such as the NRC VAD Lexicon, published in 2018, include VAD association ratings for words. Here, we present a complement to it, which has human ratings of valence, arousal, and dominance for 10k English Multiword Expressions (MWEs) and their constituent words. We also increase the coverage of unigrams, especially words that have become more common since 2018. In all, the new NRC VAD Lexicon v2 now has entries for 10k MWEs and 25k words, in addition to the entries in v1. We show that the associations are highly reliable. We use the lexicon to examine emotional characteristics of MWEs, including: 1. The degree to which MWEs (idioms, noun compounds, and verb particle constructions) exhibit strong emotionality; 2. The degree of emotional…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMental Health via Writing · Neurobiology of Language and Bilingualism · Sentiment Analysis and Opinion Mining
