ViMo: Generating Motions from Casual Videos

Liangdong Qiu; Chengxing Yu; Yanran Li; Zhao Wang; Haibin Huang,; Chongyang Ma; Di Zhang; Pengfei Wan; Xiaoguang Han

arXiv:2408.06614·cs.CV·August 14, 2024

ViMo: Generating Motions from Casual Videos

Liangdong Qiu, Chengxing Yu, Yanran Li, Zhao Wang, Haibin Huang,, Chongyang Ma, Di Zhang, Pengfei Wan, Xiaoguang Han

PDF

Open Access

TL;DR

ViMo leverages diffusion models to generate diverse, realistic 3D human motions from casual videos, overcoming challenges like camera movements and occlusions, and enabling applications like dance motion synthesis.

Contribution

Introduces ViMo, a novel video-to-motion framework that captures motions from casual videos, expanding motion generation beyond limited Mocap datasets using diffusion models.

Findings

01

Generates natural, diverse motions from complex videos

02

Handles rapid movements, varying perspectives, and occlusions effectively

03

Enables applications like dance motion synthesis from music and style

Abstract

Although humans have the innate ability to imagine multiple possible actions from videos, it remains an extraordinary challenge for computers due to the intricate camera movements and montages. Most existing motion generation methods predominantly rely on manually collected motion datasets, usually tediously sourced from motion capture (Mocap) systems or Multi-View cameras, unavoidably resulting in a limited size that severely undermines their generalizability. Inspired by recent advance of diffusion models, we probe a simple and effective way to capture motions from videos and propose a novel Video-to-Motion-Generation framework (ViMo) which could leverage the immense trove of untapped video content to produce abundant and diverse 3D human motions. Distinct from prior work, our videos could be more causal, including complicated camera movements and occlusions. Striking experimental…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Video Analysis and Summarization

MethodsDiffusion