Reconstructing Objects along Hand Interaction Timelines in Egocentric Video
Zhifan Zhu, Siddhant Bansal, Shashank Tripathi, Dima Damen

TL;DR
This paper introduces ROHIT, a task for reconstructing objects during hand interactions in egocentric videos, using a novel pose propagation framework that improves reconstruction accuracy without requiring 3D ground truth.
Contribution
The paper proposes a new task and a constrained optimization framework for object reconstruction along hand interaction timelines in egocentric videos, focusing on stable grasps.
Findings
COP improves stable grasp reconstruction by 6.2-11.3%.
HIT reconstruction improves by up to 24.5%.
Effective annotation and evaluation without 3D ground truth.
Abstract
We introduce the task of Reconstructing Objects along Hand Interaction Timelines (ROHIT). We first define the Hand Interaction Timeline (HIT) from a rigid object's perspective. In a HIT, an object is first static relative to the scene, then is held in hand following contact, where its pose changes. This is usually followed by a firm grip during use, before it is released to be static again w.r.t. to the scene. We model these pose constraints over the HIT, and propose to propagate the object's pose along the HIT enabling superior reconstruction using our proposed Constrained Optimisation and Propagation (COP) framework. Importantly, we focus on timelines with stable grasps - i.e. where the hand is stably holding an object, effectively maintaining constant contact during use. This allows us to efficiently annotate, study, and evaluate object reconstruction in videos without 3D ground…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Human Motion and Animation
