TL;DR
This paper explores whether generalized sounds and music can share an emotional space, using multi-domain learning on combined datasets to improve emotion prediction accuracy in arousal and valence.
Contribution
It introduces a joint learning approach that combines features from sounds and music to enhance emotion prediction, outperforming existing methods.
Findings
Joint learning improves emotion prediction accuracy.
Shared emotional space for sounds and music is feasible.
Method outperforms state-of-the-art in both domains.
Abstract
In this study, we aim to determine if generalized sounds and music can share a common emotional space, improving predictions of emotion in terms of arousal and valence. We propose the use of multiple datasets as a multi-domain learning technique. Our approach involves creating a common space encompassing features that characterize both generalized sounds and music, as they can evoke emotions in a similar manner. To achieve this, we utilized two publicly available datasets, namely IADS-E and PMEmo, following a standardized experimental protocol. We employed a wide variety of features that capture diverse aspects of the audio structure including key parameters of spectrum, energy, and voicing. Subsequently, we performed joint learning on the common feature space, leveraging heterogeneous model architectures. Interestingly, this synergistic scheme outperforms the state-of-the-art in both…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
