Diffusion-based 3D Hand Motion Recovery with Intuitive Physics
Yufei Zhang, Zijun Cui, Jeffrey O. Kephart, Qiang Ji

TL;DR
This paper introduces a diffusion-based framework for 3D hand motion recovery from videos that incorporates physics knowledge, improving accuracy and temporal coherence without needing annotated video data.
Contribution
The novel physics-augmented diffusion model enhances 3D hand motion sequences using only motion capture data, improving upon existing image-based methods.
Findings
Achieves state-of-the-art performance on benchmarks.
Significantly improves temporal coherence in hand motion sequences.
Effectively integrates physics constraints into the diffusion process.
Abstract
While 3D hand reconstruction from monocular images has made significant progress, generating accurate and temporally coherent motion estimates from videos remains challenging, particularly during hand-object interactions. In this paper, we present a novel 3D hand motion recovery framework that enhances image-based reconstructions through a diffusion-based and physics-augmented motion refinement model. Our model captures the distribution of refined motion estimates conditioned on initial ones, generating improved sequences through an iterative denoising process. Instead of relying on scarce annotated video data, we train our model only using motion capture data without images. We identify valuable intuitive physics knowledge during hand-object interactions, including key motion states and their associated motion constraints. We effectively integrate these physical insights into our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Robot Manipulation and Learning
