RHINO: Reconstructing Human Interactions with Novel Objects from Monocular Videos

Lixin Xue; Chengwei Zheng; Georgios Paschalidis; Chen Guo; Manuel Kaufmann; Juan Zarate; Dimitrios Tzionas

arXiv:2605.17014·cs.CV·May 19, 2026

RHINO: Reconstructing Human Interactions with Novel Objects from Monocular Videos

Lixin Xue, Chengwei Zheng, Georgios Paschalidis, Chen Guo, Manuel Kaufmann, Juan Zarate, Dimitrios Tzionas

PDF

1 Repo

TL;DR

RHINO is a three-step framework that reconstructs 3D humans, novel objects, and scenes from monocular videos, leveraging foundation models, motion estimation, and neural fields for accurate, physically plausible reconstructions.

Contribution

The paper introduces RHINO, a novel method that jointly reconstructs humans, unseen objects, and scenes from monocular videos, addressing occlusion and motion entanglement challenges.

Findings

01

RHINO outperforms state-of-the-art methods on novel-view synthesis.

02

The framework achieves accurate 4D reconstructions with physically plausible shapes.

03

Each stage of RHINO significantly improves reconstruction quality.

Abstract

Reconstructing people, objects, and their interactions in 3D is a long-standing goal for intelligent systems. Often the input is RGB video from a moving camera, making the task ill-posed; depth is ambiguous, humans and objects occlude each other, and camera and object motion entangle to create apparent motion. Most prior work addresses humans or objects in isolation, ignoring their interplay, or assumes known 3D shapes or cameras, which is impractical for real-world applications. We develop RHINO (Reconstructing Human Interactions with Novel Objects), a three-step framework that recovers in 3D a human, novel (unseen) manipulated object, and static scene in a common world frame from a monocular RGB video. First, we leverage 3D-aware foundation models to obtain cues that stabilize Structure-from-Motion (SfM) even for low-texture regions; this yields a coarse shape and apparent motion of a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://lxxue.github.io/RHINO
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.