A Unified Framework for Motion Reasoning and Generation in Human   Interaction

Jeongeun Park; Sungjoon Choi; Sangdoo Yun

arXiv:2410.05628·cs.AI·March 13, 2025

A Unified Framework for Motion Reasoning and Generation in Human Interaction

Jeongeun Park, Sungjoon Choi, Sangdoo Yun

PDF

Open Access

TL;DR

This paper introduces VIM, a unified model that integrates language and motion understanding to generate and control interactive human motions in multi-turn conversations, supported by a large-scale dataset Inter-MT2.

Contribution

The paper presents VIM, a novel unified architecture for simultaneous motion and language processing, and introduces Inter-MT2, a large-scale dataset for interactive motion instruction tuning.

Findings

01

VIM effectively handles multiple interactive motion tasks.

02

Inter-MT2 enables training of versatile motion-language models.

03

VIM demonstrates strong performance across diverse motion understanding and generation tasks.

Abstract

Recent advancements in large language models (LLMs) have significantly improved their ability to generate natural and contextually relevant text, enabling more human-like AI interactions. However, generating and understanding interactive human-like motion, where multiple individuals engage in coordinated movements, remains challenging due to the complexity of modeling these interactions. Additionally, a unified and versatile model is needed to handle diverse interactive scenarios, such as chat systems that dynamically adapt to user instructions and assigned roles. To address these challenges, we introduce VIM, the Versatile Interactive Motion-language model, which integrates both language and motion modalities to effectively understand, generate, and control interactive motions in multi-turn conversational contexts. Unlike previous studies that primarily focus on uni-directional tasks…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation

MethodsALIGN