TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction

Yiyao Huang; Zhedong Zheng; Yu Ziwei; Yaxiong Wang; Tze Ho Elden Tse; Angela Yao

arXiv:2506.00953·cs.CV·June 3, 2025

TIGeR: Text-Instructed Generation and Refinement for Template-Free Hand-Object Interaction

Yiyao Huang, Zhedong Zheng, Yu Ziwei, Yaxiong Wang, Tze Ho Elden Tse, Angela Yao

PDF

Open Access

TL;DR

TIGeR introduces a text-driven framework for 3D object shape refinement and pose estimation in hand-object interactions, reducing manual effort and improving adaptability to occlusions.

Contribution

The paper presents a novel two-stage text-instructed generation and refinement framework that leverages text priors and vision guidance for template-free 3D object reconstruction.

Findings

01

Achieves competitive Chamfer distances on Dex-YCB and Obman datasets.

02

Demonstrates robustness to occlusion in object reconstruction.

03

Maintains compatibility with heterogeneous prior sources.

Abstract

Pre-defined 3D object templates are widely used in 3D reconstruction of hand-object interactions. However, they often require substantial manual efforts to capture or source, and inherently restrict the adaptability of models to unconstrained interaction scenarios, e.g., heavily-occluded objects. To overcome this bottleneck, we propose a new Text-Instructed Generation and Refinement (TIGeR) framework, harnessing the power of intuitive text-driven priors to steer the object shape refinement and pose estimation. We use a two-stage framework: a text-instructed prior generation and vision-guided refinement. As the name implies, we first leverage off-the-shelf models to generate shape priors according to the text description without tedious 3D crafting. Considering the geometric gap between the synthesized prototype and the real object interacted with the hand, we further calibrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and dialogue systems · Human Motion and Animation · Hand Gesture Recognition Systems