MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal   Music Processing

Yu-Fen Huang; Nikki Moran; Simon Coleman; Jon Kelly; Shun-Hwa Wei,; Po-Yin Chen; Yun-Hsin Huang; Tsung-Ping Chen; Yu-Chia Kuo; Yu-Chi Wei,; Chih-Hsuan Li; Da-Yu Huang; Hsuan-Kai Kao; Ting-Wei Lin; Li Su

arXiv:2406.06375·cs.SD·June 11, 2024

MOSA: Music Motion with Semantic Annotation Dataset for Cross-Modal Music Processing

Yu-Fen Huang, Nikki Moran, Simon Coleman, Jon Kelly, Shun-Hwa Wei,, Po-Yin Chen, Yun-Hsin Huang, Tsung-Ping Chen, Yu-Chia Kuo, Yu-Chi Wei,, Chih-Hsuan Li, Da-Yu Huang, Hsuan-Kai Kao, Ting-Wei Lin, Li Su

PDF

1 Repo

TL;DR

The paper introduces MOSA, a large-scale, multi-modal music dataset with detailed annotations, enabling advanced research in cross-modal music retrieval and generation tasks.

Contribution

It presents the MOSA dataset, the largest cross-modal music dataset with note-level annotations, facilitating new research in music information retrieval and content generation.

Findings

01

Demonstrated tasks include beat, phrase, and expressive content detection from multiple data modalities.

02

Generated musicians' body motion from music audio using the dataset.

03

Provided a comprehensive benchmark for cross-modal music processing.

Abstract

In cross-modal music processing, translation between visual, auditory, and semantic content opens up new possibilities as well as challenges. The construction of such a transformative scheme depends upon a benchmark corpus with a comprehensive data infrastructure. In particular, the assembly of a large-scale cross-modal dataset presents major challenges. In this paper, we present the MOSA (Music mOtion with Semantic Annotation) dataset, which contains high quality 3-D motion capture data, aligned audio recordings, and note-by-note semantic annotations of pitch, beat, phrase, dynamic, articulation, and harmony for 742 professional music performances by 23 professional musicians, comprising more than 30 hours and 570 K notes of data. To our knowledge, this is the largest cross-modal music dataset with note-level annotations to date. To demonstrate the usage of the MOSA dataset, we present…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yufenhuang/mosa-music-motion-and-semantic-annotation-dataset
tfOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.