MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation

Ke Qiu; Yawen Qin; Tianzhi Jia; Xiaole Yang; Kaimin Wang; Kaixing Yang

arXiv:2605.01197·cs.SD·May 5, 2026

MG-Former: A Transformer-Based Framework for Music-Driven 3D Conducting Gesture Generation

Ke Qiu, Yawen Qin, Tianzhi Jia, Xiaole Yang, Kaimin Wang, Kaixing Yang

PDF

TL;DR

This paper introduces TransConductor, a Transformer-based framework that generates realistic conducting gestures from music by leveraging a new dataset, advanced encoding, and a retrieval-based evaluation method.

Contribution

It presents a novel Transformer framework for music-driven conducting gesture synthesis, including a detailed dataset and a new evaluation protocol for artistic alignment.

Findings

01

TransConductor outperforms existing dance and conducting generation baselines.

02

The Transformer backbone and alignment loss improve gesture-music synchronization.

03

The dataset ConductorMotion enables detailed 3D conducting gesture analysis.

Abstract

Generating expressive conducting gestures from music is a challenging cross-modal motion synthesis problem: the output must follow long-range musical structure, preserve beat-level synchronization, and remain plausible as a fine-grained 3D human performance. Existing conducting-motion studies are often limited by sparse pose representations, small-scale data, or evaluation protocols that do not directly measure whether music and gesture are mutually aligned. This paper presents TransConductor, a Transformer-based framework for music-driven conducting gesture generation. We introduce ConductorMotion, a SMPL-parameter data construction pipeline that recovers detailed body motion from conducting videos and forms a dataset targeted at professional conducting gestures. Given acoustic descriptors extracted from audio and an initial pose, TransConductor uses a Trans-Temporal Music Encoder and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.