Audio Matters Too! Enhancing Markerless Motion Capture with Audio   Signals for String Performance Capture

Yitong Jin; Zhiping Qiu; Yi Shi; Shuangpeng Sun; Chongwu Wang; Donghao; Pan; Jiachen Zhao; Zhenghao Liang; Yuan Wang; Xiaobing Li; Feng Yu; Tao Yu,; Qionghai Dai

arXiv:2405.04963·cs.MM·May 9, 2024

Audio Matters Too! Enhancing Markerless Motion Capture with Audio Signals for String Performance Capture

Yitong Jin, Zhiping Qiu, Yi Shi, Shuangpeng Sun, Chongwu Wang, Donghao, Pan, Jiachen Zhao, Zhenghao Liang, Yuan Wang, Xiaobing Li, Feng Yu, Tao Yu,, Qionghai Dai

PDF

1 Repo

TL;DR

This paper introduces a multi-modal, markerless motion capture framework that leverages audio signals to improve the accuracy of capturing subtle string instrument performances, validated on a novel large-scale dataset.

Contribution

It presents the first multi-view, multi-modal dataset for string instrument performance and proposes an audio-guided motion capture method that enhances visual results without external markers.

Findings

01

Outperforms state-of-the-art vision-based algorithms

02

Effectively captures subtle hand-string contact movements

03

Demonstrates the benefit of integrating audio cues into motion capture

Abstract

In this paper, we touch on the problem of markerless multi-modal human motion capture especially for string performance capture which involves inherently subtle hand-string contacts and intricate movements. To fulfill this goal, we first collect a dataset, named String Performance Dataset (SPD), featuring cello and violin performances. The dataset includes videos captured from up to 23 different views, audio signals, and detailed 3D motion annotations of the body, hands, instrument, and bow. Moreover, to acquire the detailed motion annotations, we propose an audio-guided multi-modal motion capture framework that explicitly incorporates hand-string contacts detected from the audio signals for solving detailed hand poses. This framework serves as a baseline for string performance capture in a completely markerless manner without imposing any external devices on performers, eliminating the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Yitongishere/string_performance
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.