AIMusicGuru: Music Assisted Human Pose Correction
Snehesh Shrestha, Cornelia Ferm\"uller, Tianyu Huang, Pyone Thant Win,, Adam Zukerman, Chethan M. Parameshwara, Yiannis Aloimonos

TL;DR
This paper introduces MAPnet, a novel audio-visual approach that leverages sound to improve human pose estimation, especially in challenging scenarios like instrument playing, supported by a new dataset MAPdat.
Contribution
It presents MAPnet, a new multi-modal model that uses audio to refine human pose predictions, and releases MAPdat, a dataset for violin playing motion and sound analysis.
Findings
Audio improves pose estimation accuracy.
Multi-modal approaches outperform visual-only methods.
MAPnet shows significant qualitative and quantitative improvements.
Abstract
Pose Estimation techniques rely on visual cues available through observations represented in the form of pixels. But the performance is bounded by the frame rate of the video and struggles from motion blur, occlusions, and temporal coherence. This issue is magnified when people are interacting with objects and instruments, for example playing the violin. Standard approaches for postprocessing use interpolation and smoothing functions to filter noise and fill gaps, but they cannot model highly non-linear motion. We present a method that leverages our understanding of the high degree of a causal relationship between the sound produced and the motion that produces them. We use the audio signature to refine and predict accurate human body pose motion models. We propose MAPnet (Music Assisted Pose network) for generating a fine grain motion model from sparse input pose sequences but…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Hand Gesture Recognition Systems
