Enabling Embodied Analogies in Intelligent Music Systems
Fabio Paolizzo

TL;DR
This paper presents a multidisciplinary approach to cross-modal machine learning in music, integrating emotion recognition from audio, lyrics, and motion capture data to enhance embodied understanding in intelligent music systems.
Contribution
It introduces a novel dataset combining music, lyrics, and motion capture, and applies machine learning techniques to analyze emotional content across modalities.
Findings
Successful emotion classification from music and lyrics
Effective integration of motion capture data
Enhanced cross-modal understanding in music systems
Abstract
The present methodology is aimed at cross-modal machine learning and uses multidisciplinary tools and methods drawn from a broad range of areas and disciplines, including music, systematic musicology, dance, motion capture, human-computer interaction, computational linguistics and audio signal processing. Main tasks include: (1) adapting wisdom-of-the-crowd approaches to embodiment in music and dance performance to create a dataset of music and music lyrics that covers a variety of emotions, (2) applying audio/language-informed machine learning techniques to that dataset to identify automatically the emotional content of the music and the lyrics, and (3) integrating motion capture data from a Vicon system and dancers performing on that music.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Human Motion and Animation
