Reconstructing Objects along Hand Interaction Timelines in Egocentric Video

Zhifan Zhu; Siddhant Bansal; Shashank Tripathi; Dima Damen

arXiv:2512.07394·cs.CV·December 9, 2025

Reconstructing Objects along Hand Interaction Timelines in Egocentric Video

Zhifan Zhu, Siddhant Bansal, Shashank Tripathi, Dima Damen

PDF

Open Access

TL;DR

This paper introduces ROHIT, a task for reconstructing objects during hand interactions in egocentric videos, using a novel pose propagation framework that improves reconstruction accuracy without requiring 3D ground truth.

Contribution

The paper proposes a new task and a constrained optimization framework for object reconstruction along hand interaction timelines in egocentric videos, focusing on stable grasps.

Findings

01

COP improves stable grasp reconstruction by 6.2-11.3%.

02

HIT reconstruction improves by up to 24.5%.

03

Effective annotation and evaluation without 3D ground truth.

Abstract

We introduce the task of Reconstructing Objects along Hand Interaction Timelines (ROHIT). We first define the Hand Interaction Timeline (HIT) from a rigid object's perspective. In a HIT, an object is first static relative to the scene, then is held in hand following contact, where its pose changes. This is usually followed by a firm grip during use, before it is released to be static again w.r.t. to the scene. We model these pose constraints over the HIT, and propose to propagate the object's pose along the HIT enabling superior reconstruction using our proposed Constrained Optimisation and Propagation (COP) framework. Importantly, we focus on timelines with stable grasps - i.e. where the hand is stably holding an object, effectively maintaining constant contact during use. This allows us to efficiently annotate, study, and evaluate object reconstruction in videos without 3D ground…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobot Manipulation and Learning · Human Pose and Action Recognition · Human Motion and Animation