BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion

Tianzhi Jia; Kaixing Yang; Xiaole Yang; Xulong Tang; Ke Qiu; Shikui Wei; Yao Zhao

arXiv:2604.04395·cs.CV·April 24, 2026

BiTDiff: Fine-Grained 3D Conducting Motion Generation via BiMamba-Transformer Diffusion

Tianzhi Jia, Kaixing Yang, Xiaole Yang, Xulong Tang, Ke Qiu, Shikui Wei, Yao Zhao

PDF

TL;DR

This paper introduces BiTDiff, a novel diffusion-based framework utilizing a BiMamba-Transformer hybrid model for efficient, high-quality 3D conducting motion generation from music, supported by a new large-scale dataset.

Contribution

It presents a new dataset for 3D conducting motion and a novel model architecture that supports long-sequence, fine-grained motion synthesis with state-of-the-art results.

Findings

01

BiTDiff outperforms previous methods on the CM-Data dataset.

02

The dataset CM-Data is the first large-scale public 3D conducting motion dataset.

03

BiTDiff enables training-free joint-level motion editing.

Abstract

3D conducting motion generation aims to synthesize fine-grained conductor motions from music, with broad potential in music education, virtual performance, digital human animation, and human-AI co-creation. However, this task remains underexplored due to two major challenges: (1) the lack of large-scale fine-grained 3D conducting datasets and (2) the absence of effective methods that can jointly support long-sequence generation with high quality and efficiency. To address the data limitation, we develop a quality-oriented 3D conducting motion collection pipeline and construct CM-Data, a fine-grained SMPL-X dataset with about 10 hours of conducting motion data. To the best of our knowledge, CM-Data is the first and largest public dataset for 3D conducting motion generation. To address the methodological limitation, we propose BiTDiff, a novel framework for 3D conducting motion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.