DeepSimHO: Stable Pose Estimation for Hand-Object Interaction via Physics Simulation
Rong Wang, Wei Mao, Hongdong Li

TL;DR
DeepSimHO introduces a deep-learning pipeline that integrates physics simulation and neural network-based gradient approximation to improve the stability and efficiency of 3D hand-object pose estimation from a single image.
Contribution
It presents a novel approach combining physics simulation with neural networks to evaluate and refine hand-object poses, addressing stability issues in pose estimation.
Findings
Significantly improves stability of estimated poses
Achieves superior efficiency over test-time optimization
Effectively approximates physics simulation gradients
Abstract
This paper addresses the task of 3D pose estimation for a hand interacting with an object from a single image observation. When modeling hand-object interaction, previous works mainly exploit proximity cues, while overlooking the dynamical nature that the hand must stably grasp the object to counteract gravity and thus preventing the object from slipping or falling. These works fail to leverage dynamical constraints in the estimation and consequently often produce unstable results. Meanwhile, refining unstable configurations with physics-based reasoning remains challenging, both by the complexity of contact dynamics and by the lack of effective and efficient physics inference in the data-driven learning framework. To address both issues, we present DeepSimHO: a novel deep-learning pipeline that combines forward physics simulation and backward gradient approximation with a neural…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsHuman Pose and Action Recognition · Robot Manipulation and Learning · Hand Gesture Recognition Systems
MethodsGravity · Balanced Selection
