ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

Zhiping Qiu; Yitong Jin; Yuan Wang; Yi Shi; Chongwu Wang; Chao Tan; Xiaobing Li; Feng Yu; Tao Yu; Qionghai Dai

arXiv:2505.04203·cs.GR·July 2, 2025

ELGAR: Expressive Cello Performance Motion Generation for Audio Rendition

Zhiping Qiu, Yitong Jin, Yuan Wang, Yi Shi, Chongwu Wang, Chao Tan, Xiaobing Li, Feng Yu, Tao Yu, Qionghai Dai

PDF

TL;DR

ELGAR is a diffusion-based framework that generates expressive whole-body cello performance motions from audio, incorporating interactive contact losses and novel evaluation metrics, advancing realistic instrument motion synthesis.

Contribution

The paper introduces ELGAR, a novel diffusion model for audio-driven cello motion generation, with new contact loss functions and specialized metrics for performance evaluation.

Findings

01

ELGAR effectively generates realistic, fast, and complex cello motions.

02

The proposed contact losses improve interaction authenticity.

03

New metrics correlate well with motion-audio semantic alignment.

Abstract

The art of instrument performance stands as a vivid manifestation of human creativity and emotion. Nonetheless, generating instrument performance motions is a highly challenging task, as it requires not only capturing intricate movements but also reconstructing the complex dynamics of the performer-instrument interaction. While existing works primarily focus on modeling partial body motions, we propose Expressive ceLlo performance motion Generation for Audio Rendition (ELGAR), a state-of-the-art diffusion-based framework for whole-body fine-grained instrument performance motion generation solely from audio. To emphasize the interactive nature of the instrument performance, we introduce Hand Interactive Contact Loss (HICL) and Bow Interactive Contact Loss (BICL), which effectively guarantee the authenticity of the interplay. Moreover, to better evaluate whether the generated motions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.