MotionCLIP: Exposing Human Motion Generation to CLIP Space

Guy Tevet; Brian Gordon; Amir Hertz; Amit H. Bermano; Daniel Cohen-Or

arXiv:2203.08063·cs.CV·March 16, 2022

MotionCLIP: Exposing Human Motion Generation to CLIP Space

Guy Tevet, Brian Gordon, Amir Hertz, Amit H. Bermano, Daniel Cohen-Or

PDF

1 Repo 1 Models

TL;DR

MotionCLIP is a novel 3D human motion auto-encoder that aligns its latent space with CLIP, enabling rich semantic understanding, out-of-domain action generation, and advanced motion editing from textual descriptions.

Contribution

It introduces a transformer-based auto-encoder aligned with CLIP space, allowing semantic, out-of-domain, and disentangled motion generation from text prompts.

Findings

01

Enables text-to-motion generation for unseen actions

02

Supports motion editing and interpolation using semantic space

03

Achieves high-quality, semantically meaningful motion synthesis

Abstract

We introduce MotionCLIP, a 3D human motion auto-encoder featuring a latent embedding that is disentangled, well behaved, and supports highly semantic textual descriptions. MotionCLIP gains its unique power by aligning its latent space with that of the Contrastive Language-Image Pre-training (CLIP) model. Aligning the human motion manifold to CLIP space implicitly infuses the extremely rich semantic knowledge of CLIP into the manifold. In particular, it helps continuity by placing semantically similar motions close to one another, and disentanglement, which is inherited from the CLIP-space structure. MotionCLIP comprises a transformer-based motion auto-encoder, trained to reconstruct motion while being aligned to its text label's position in CLIP-space. We further leverage CLIP's unique visual understanding and inject an even stronger signal through aligning motion to rendered frames in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

guytevet/motionclip
pytorchOfficial

Models

🤗
vonexel/smog
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

MethodsContrastive Language-Image Pre-training