Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing
Ling-Hao Chen, Shunlin Lu, Wenxun Dai, Zhiyang Dou, Xuan Ju, Jingbo, Wang, Taku Komura, Lei Zhang

TL;DR
This paper introduces MotionCLR, an attention-based diffusion model for human motion generation that enables fine-grained, explainable editing by modeling word-sequence correspondence and motion features.
Contribution
The paper presents a novel attention-based diffusion model with CLeaR attention modeling for interactive, explainable human motion editing and generation.
Findings
Effective motion editing via attention map manipulation
Good explainability demonstrated through action-counting and grounded generation
Competitive performance in motion generation and editing tasks
Abstract
This research delves into the problem of interactive editing of human motion generation. Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability. To address this issue, we propose an attention-based motion diffusion model, namely MotionCLR, with CLeaR modeling of attention mechanisms. Technically, MotionCLR models the in-modality and cross-modality interactions with self-attention and cross-attention, respectively. More specifically, the self-attention mechanism aims to measure the sequential similarity between frames and impacts the order of motion features. By contrast, the cross-attention mechanism works to find the fine-grained word-sequence correspondence and activate the corresponding timesteps in the motion sequence. Based on these key properties, we develop…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Vision and Imaging
MethodsSoftmax · Attention Is All You Need · Diffusion · Sparse Evolutionary Training
