BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion   Generation

S. Rohollah Hosseyni; Ali Ahmad Rahmani; S. Jamal Seyedmohammadi,; Sanaz Seyedin; Arash Mohammadi

arXiv:2409.10847·cs.CL·September 18, 2024

BAD: Bidirectional Auto-regressive Diffusion for Text-to-Motion Generation

S. Rohollah Hosseyni, Ali Ahmad Rahmani, S. Jamal Seyedmohammadi,, Sanaz Seyedin, Arash Mohammadi

PDF

Open Access 1 Repo

TL;DR

BAD introduces a novel bidirectional autoregressive diffusion model that combines the strengths of autoregressive and mask-based approaches, improving text-to-motion generation by effectively capturing complex sequence dependencies.

Contribution

It unifies autoregressive and mask-based models using permutation-based corruption, enabling better modeling of bidirectional and sequential dependencies.

Findings

01

Outperforms existing autoregressive and mask-based models in text-to-motion tasks.

02

Demonstrates the effectiveness of permutation-based corruption for sequence modeling.

03

Provides a new pre-training strategy for sequence generation.

Abstract

Autoregressive models excel in modeling sequential dependencies by enforcing causal constraints, yet they struggle to capture complex bidirectional patterns due to their unidirectional nature. In contrast, mask-based models leverage bidirectional context, enabling richer dependency modeling. However, they often assume token independence during prediction, which undermines the modeling of sequential dependencies. Additionally, the corruption of sequences through masking or absorption can introduce unnatural distortions, complicating the learning process. To address these issues, we propose Bidirectional Autoregressive Diffusion (BAD), a novel approach that unifies the strengths of autoregressive and mask-based generative models. BAD utilizes a permutation-based corruption technique that preserves the natural sequence structure while enforcing causal dependencies through randomized…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

rohollahhs/bad
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Video Analysis and Summarization · Handwritten Text Recognition Techniques

MethodsDiffusion