AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

Jin-Chuan Shi; Binhong Ye; Tao Liu; Xiaoyang Liu; Yangjinhui Xu; Junzhe He; Zeju Li; Hao Chen; Chunhua Shen

arXiv:2602.04672·cs.CV·May 11, 2026

AGILE: Hand-Object Interaction Reconstruction from Video via Agentic Generation

Jin-Chuan Shi, Binhong Ye, Tao Liu, Xiaoyang Liu, Yangjinhui Xu, Junzhe He, Zeju Li, Hao Chen, Chunhua Shen

PDF

1 Repo

TL;DR

AGILE introduces a novel framework for reconstructing hand-object interactions from monocular videos by using agentic generation and robust tracking, overcoming occlusion and initialization challenges.

Contribution

It shifts from traditional reconstruction to agentic generation guided by vision-language models, enabling robust, complete, and physically plausible interaction reconstructions without fragile SfM.

Findings

01

Outperforms baselines in geometric accuracy.

02

Demonstrates robustness on challenging in-the-wild sequences.

03

Produces simulation-ready assets validated via real-to-sim retargeting.

Abstract

Reconstructing dynamic hand-object interactions from monocular videos is critical for dexterous manipulation data collection and creating realistic digital twins for robotics and VR. However, current methods face two prohibitive barriers: (1) reliance on neural rendering often yields fragmented, non-simulation-ready geometries under heavy occlusion, and (2) dependence on brittle Structure-from-Motion (SfM) initialization leads to frequent failures on in-the-wild footage. To overcome these limitations, we introduce AGILE, a robust framework that shifts the paradigm from reconstruction to agentic generation for interaction learning. First, we employ an agentic pipeline where a Vision-Language Model (VLM) guides a generative model to synthesize a complete, watertight object mesh with high-fidelity texture, independent of video occlusions. Second, bypassing fragile SfM entirely, we propose…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://agile-hoi.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.