CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Yukang Cao; Xinying Guo; Mingyuan Zhang; Haozhe Xie; Chenyang Gu; Ziwei Liu

arXiv:2407.06188·cs.CV·May 12, 2025

CrowdMoGen: Zero-Shot Text-Driven Collective Motion Generation

Yukang Cao, Xinying Guo, Mingyuan Zhang, Haozhe Xie, Chenyang Gu, Ziwei Liu

PDF

Open Access

TL;DR

CrowdMoGen is a novel zero-shot framework that uses large language models and SMPL priors to generate realistic, event-aligned collective crowd motions from text prompts, addressing scalability and controllability challenges.

Contribution

It introduces the first zero-shot collective motion generation framework combining LLMs for scene planning and a transformer-based generator for realistic crowd motions.

Findings

01

Outperforms previous methods in realism and coherence

02

Effectively organizes individuals into groups using LLMs

03

Generates contextually appropriate, event-driven crowd motions

Abstract

While recent advances in text-to-motion generation have shown promising results, they typically assume all individuals are grouped as a single unit. Scaling these methods to handle larger crowds and ensuring that individuals respond appropriately to specific events remains a significant challenge. This is primarily due to the complexities of scene planning, which involves organizing groups, planning their activities, and coordinating interactions, and controllable motion generation. In this paper, we present CrowdMoGen, the first zero-shot framework for collective motion generation, which effectively groups individuals and generates event-aligned motion sequences from text prompts. 1) Being limited by the available datasets for training an effective scene planning module in a supervised manner, we instead propose a crowd scene planner that leverages pre-trained large language models…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Interactive and Immersive Displays · Social Robot Interaction and HRI

MethodsFocus