TapMo: Shape-aware Motion Generation of Skeleton-free Characters

Jiaxu Zhang; Shaoli Huang; Zhigang Tu; Xin Chen; Xiaohang Zhan; Gang; Yu; Ying Shan

arXiv:2310.12678·cs.GR·October 20, 2023·1 cites

TapMo: Shape-aware Motion Generation of Skeleton-free Characters

Jiaxu Zhang, Shaoli Huang, Zhigang Tu, Xin Chen, Xiaohang Zhan, Gang, Yu, Ying Shan

PDF

Open Access 3 Reviews

TL;DR

TapMo introduces a shape-aware, text-driven motion generation pipeline for skeleton-free 3D characters, enabling realistic animations without traditional rigging, and outperforms existing methods in quality and generalizability.

Contribution

The paper presents TapMo, a novel shape deformation-aware diffusion model that generates mesh-specific motions for non-rigged characters, eliminating the need for skeletal rigging.

Findings

01

Outperforms existing auto-animation methods in quality.

02

Works effectively on both seen and unseen 3D characters.

03

Handles diverse non-human meshes with shape-aware features.

Abstract

Previous motion generation methods are limited to the pre-rigged 3D human model, hindering their applications in the animation of various non-rigged characters. In this work, we present TapMo, a Text-driven Animation Pipeline for synthesizing Motion in a broad spectrum of skeleton-free 3D characters. The pivotal innovation in TapMo is its use of shape deformation-aware features as a condition to guide the diffusion model, thereby enabling the generation of mesh-specific motions for various characters. Specifically, TapMo comprises two main components - Mesh Handle Predictor and Shape-aware Diffusion Module. Mesh Handle Predictor predicts the skinning weights and clusters mesh vertices into adaptive handles for deformation control, which eliminates the need for traditional skeletal rigging. Shape-aware Motion Diffusion synthesizes motion with mesh-specific adaptations. This module…

Peer Reviews

Decision·ICLR 2024 poster

Reviewer 01Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. the generated animations for a variety of 3D characters are impressive. The structure of the 3D meshes are well recognized when associating it with the animations. 2. The application of shape deformation feature in animation is nice.

Weaknesses

There are still penetrations between foot and ground in the generated animations, which downgrade the animation quality.

Reviewer 02Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

The research addresses an interesting and promising problem, as far as I know it is the first attempt to enable text-driven motion synthesis for skeleton-free characters. Comprehensive experiments are conducted, yielding impressive results across diverse shapes. Supplementary videos and a user study further validate the naturalness of the generated results. The combination of diffusion-based motion synthesis and skeleton-free mesh deformation is interesting and novel.

Weaknesses

Some details are not clearly explained, such as the mesh deformation feature, what exactly is f_ and how it's obtained, and its dimensions, which are not reflected in the main text. From the appendix, it seems to be a 512-dimensional vector. Further explanation from the authors is desired. And how does mesh-specific adaptation affect the vertices, it is not included in the equations. How is the Discriminator implemented? Are the two modules trained jointly or separately? What are the visualizat

Reviewer 03Rating 6· marginally above the acceptance thresholdConfidence 4

Strengths

1. it is good to study generating shape-aware motions, especially for non-humanoid 3D characters. 2. The proposed method seems to be reasonable and might be promising to generate motions for unseen characters.

Weaknesses

1. The proposed mesh handle predictor is simple and straightforward, but it is not clear how the proposed method resolves different characters that have different topologies with different semantics. Currently, the manuscript mentions that "each handle is dynamically assigned to vertices with the same semantics across different meshes", but it is not clear how the method will select those handles. Also, it is unclear how the method will choose the number of handles since different topologies te

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · 3D Shape Modeling and Analysis · Human Pose and Action Recognition

MethodsDiffusion