ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

Zikai Wang; Zhilu Zhang; Yiqing Wang; Hui Li; Wangmeng Zuo

arXiv:2603.25791·cs.CV·March 30, 2026

ArtHOI: Taming Foundation Models for Monocular 4D Reconstruction of Hand-Articulated-Object Interactions

Zikai Wang, Zhilu Zhang, Yiqing Wang, Hui Li, Wangmeng Zuo

PDF

2 Repos

TL;DR

ArtHOI is a novel framework that leverages foundation models to reconstruct 4D hand-articulated-object interactions from a single RGB video, addressing a significant challenge in the field.

Contribution

It introduces new methodologies, including Adaptive Sampling Refinement and MLLM-guided alignment, to improve accuracy and realism in monocular 4D reconstruction of articulated objects.

Findings

01

Robust reconstruction across diverse objects and interactions.

02

Effective optimization of object scale and pose from monocular videos.

03

Validated on new datasets with extensive experiments.

Abstract

Existing hand-object interactions (HOI) methods are largely limited to rigid objects, while 4D reconstruction methods of articulated objects generally require pre-scanning the object or even multi-view videos. It remains an unexplored but significant challenge to reconstruct 4D human-articulated-object interactions from a single monocular RGB video. Fortunately, recent advancements in foundation models present a new opportunity to address this highly ill-posed problem. To this end, we introduce ArtHOI, an optimization-based framework that integrates and refines priors from multiple foundation models. Our key contribution is a suite of novel methodologies designed to resolve the inherent inaccuracies and physical unreality of these priors. In particular, we introduce an Adaptive Sampling Refinement (ASR) method to optimize object's metric scale and pose for grounding its normalized mesh…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.