KMM: Key Frame Mask Mamba for Extended Motion Generation

Zeyu Zhang; Hang Gao; Akide Liu; Qi Chen; Feng Chen; Yiran Wang,; Danning Li; Rui Zhao; Zhenming Li; Zhongwen Zhou; Hao Tang; Bohan Zhuang

arXiv:2411.06481·cs.CV·April 17, 2025

KMM: Key Frame Mask Mamba for Extended Motion Generation

Zeyu Zhang, Hang Gao, Akide Liu, Qi Chen, Feng Chen, Yiran Wang,, Danning Li, Rui Zhao, Zhenming Li, Zhongwen Zhou, Hao Tang, Bohan Zhuang

PDF

Open Access 1 Repo

TL;DR

This paper introduces KMM, a novel architecture that enhances long and complex human motion generation by focusing on key frames, improving multimodal fusion, and achieving state-of-the-art results on the BABEL dataset.

Contribution

The paper proposes KMM with key frame masking, a contrastive learning paradigm for better multimodal fusion, and demonstrates superior performance on human motion generation tasks.

Findings

01

Achieved over 57% reduction in FID score.

02

Reduced model parameters by 70% compared to previous methods.

03

Enhanced focus on key actions in motion segments.

Abstract

Human motion generation is a cut-edge area of research in generative computer vision, with promising applications in video creation, game development, and robotic manipulation. The recent Mamba architecture shows promising results in efficiently modeling long and complex sequences, yet two significant challenges remain: Firstly, directly applying Mamba to extended motion generation is ineffective, as the limited capacity of the implicit memory leads to memory decay. Secondly, Mamba struggles with multimodal fusion compared to Transformers, and lack alignment with textual queries, often confusing directions (left or right) or omitting parts of longer text queries. To address these challenges, our paper presents three key contributions: Firstly, we introduce KMM, a novel architecture featuring Key frame Masking Modeling, designed to enhance Mamba's focus on key actions in motion segments.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

steve-zeyu-zhang/KMM
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotic Mechanisms and Dynamics · Robotics and Sensor-Based Localization · Advanced Vision and Imaging

MethodsContrastive Learning · Mamba: Linear-Time Sequence Modeling with Selective State Spaces · Focus