VioPose: Violin Performance 4D Pose Estimation by Hierarchical Audiovisual Inference
Seong Jong Yoo, Snehesh Shrestha, Irina Muresanu, Cornelia, Ferm\"uller

TL;DR
VioPose is a multimodal hierarchical network that estimates precise 4D violinist poses from monocular videos by leveraging audiovisual cues, outperforming existing methods and supported by a large, diverse dataset.
Contribution
The paper introduces VioPose, a novel audiovisual hierarchical model for 4D pose estimation that addresses limitations of visual-only methods in capturing subtle musical motions.
Findings
Outperforms state-of-the-art visual pose estimation methods
Accurately captures fast and subtle violin movements
Provides a large, diverse dataset for violin performance analysis
Abstract
Musicians delicately control their bodies to generate music. Sometimes, their motions are too subtle to be captured by the human eye. To analyze how they move to produce the music, we need to estimate precise 4D human pose (3D pose over time). However, current state-of-the-art (SoTA) visual pose estimation algorithms struggle to produce accurate monocular 4D poses because of occlusions, partial views, and human-object interactions. They are limited by the viewing angle, pixel density, and sampling rate of the cameras and fail to estimate fast and subtle movements, such as in the musical effect of vibrato. We leverage the direct causal relationship between the music produced and the human motions creating them to address these challenges. We propose VioPose: a novel multimodal network that hierarchically estimates dynamics. High-level features are cascaded to low-level features and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMusic Technology and Sound Studies · Music and Audio Processing · Neuroscience and Music Perception
