BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation
S. Rohollah Hosseyni, Ali Ahmad Rahmani, S. Jamal Seyedmohammadi,, Sanaz Seyedin, Arash Mohammadi

TL;DR
BAD introduces a novel bidirectional autoregressive diffusion model that combines the strengths of autoregressive and mask-based approaches, improving text-to-motion generation by effectively capturing complex sequence dependencies.
Contribution
It unifies autoregressive and mask-based models using permutation-based corruption, enabling better modeling of bidirectional and sequential dependencies.
Findings
Outperforms existing autoregressive and mask-based models in text-to-motion tasks.
Demonstrates the effectiveness of permutation-based corruption for sequence modeling.
Provides a new pre-training strategy for sequence generation.
Abstract
Autoregressive models excel in modeling sequential dependencies by enforcing causal constraints, yet they struggle to capture complex bidirectional patterns due to their unidirectional nature. In contrast, mask-based models leverage bidirectional context, enabling richer dependency modeling. However, they often assume token independence during prediction, which undermines the modeling of sequential dependencies. Additionally, the corruption of sequences through masking or absorption can introduce unnatural distortions, complicating the learning process. To address these issues, we propose Bidirectional Autoregressive Diffusion (BAD), a novel approach that unifies the strengths of autoregressive and mask-based generative models. BAD utilizes a permutation-based corruption technique that preserves the natural sequence structure while enforcing causal dependencies through randomized…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Motion and Animation · Video Analysis and Summarization · Handwritten Text Recognition Techniques
MethodsDiffusion
