Pay Attention and Move Better: Harnessing Attention for Interactive   Motion Generation and Training-free Editing

Ling-Hao Chen; Shunlin Lu; Wenxun Dai; Zhiyang Dou; Xuan Ju; Jingbo; Wang; Taku Komura; Lei Zhang

arXiv:2410.18977·cs.CV·January 23, 2025

Pay Attention and Move Better: Harnessing Attention for Interactive Motion Generation and Training-free Editing

Ling-Hao Chen, Shunlin Lu, Wenxun Dai, Zhiyang Dou, Xuan Ju, Jingbo, Wang, Taku Komura, Lei Zhang

PDF

Open Access 1 Models

TL;DR

This paper introduces MotionCLR, an attention-based diffusion model for human motion generation that enables fine-grained, explainable editing by modeling word-sequence correspondence and motion features.

Contribution

The paper presents a novel attention-based diffusion model with CLeaR attention modeling for interactive, explainable human motion editing and generation.

Findings

01

Effective motion editing via attention map manipulation

02

Good explainability demonstrated through action-counting and grounded generation

03

Competitive performance in motion generation and editing tasks

Abstract

This research delves into the problem of interactive editing of human motion generation. Previous motion diffusion models lack explicit modeling of the word-level text-motion correspondence and good explainability, hence restricting their fine-grained editing ability. To address this issue, we propose an attention-based motion diffusion model, namely MotionCLR, with CLeaR modeling of attention mechanisms. Technically, MotionCLR models the in-modality and cross-modality interactions with self-attention and cross-attention, respectively. More specifically, the self-attention mechanism aims to measure the sequential similarity between frames and impacts the order of motion features. By contrast, the cross-attention mechanism works to find the fine-grained word-sequence correspondence and activate the corresponding timesteps in the motion sequence. Based on these key properties, we develop…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
EvanTHU/MotionCLR
model· ♡ 5
♡ 5

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Video Analysis and Summarization · Advanced Vision and Imaging

MethodsSoftmax · Attention Is All You Need · Diffusion · Sparse Evolutionary Training