ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis

Lei Li; Sen Jia; Jianhao Wang; Zhaochong An; Jiaang Li; Jenq-Neng; Hwang; Serge Belongie

arXiv:2502.18180·cs.AI·February 28, 2025

ChatMotion: A Multimodal Multi-Agent for Human Motion Analysis

Lei Li, Sen Jia, Jianhao Wang, Zhaochong An, Jiaang Li, Jenq-Neng, Hwang, Serge Belongie

PDF

Open Access

TL;DR

ChatMotion is a multimodal multi-agent framework that enhances human motion analysis by interpreting user intent, decomposing tasks, and integrating specialized modules for improved understanding and interactivity.

Contribution

It introduces a novel multi-agent system that dynamically interprets user needs and combines multiple modules for comprehensive human motion analysis.

Findings

01

Demonstrates high precision in motion understanding

02

Shows adaptability to diverse analytical tasks

03

Engages users effectively in motion analysis

Abstract

Advancements in Multimodal Large Language Models (MLLMs) have improved human motion understanding. However, these models remain constrained by their "instruct-only" nature, lacking interactivity and adaptability for diverse analytical perspectives. To address these challenges, we introduce ChatMotion, a multimodal multi-agent framework for human motion analysis. ChatMotion dynamically interprets user intent, decomposes complex tasks into meta-tasks, and activates specialized function modules for motion comprehension. It integrates multiple specialized modules, such as the MotionCore, to analyze human motion from various perspectives. Extensive experiments demonstrate ChatMotion's precision, adaptability, and user engagement for human motion understanding.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Action Observation and Synchronization · Human Pose and Action Recognition