RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations

Mochu Xiang; Zhelun Shen; Xuesong Li; Jiahui Ren; Jing Zhang; Chen Zhao; Shanshan Liu; Haocheng Feng; Jingdong Wang; Yuchao Dai

arXiv:2603.01194·cs.CV·March 3, 2026

RnG: A Unified Transformer for Complete 3D Modeling from Partial Observations

Mochu Xiang, Zhelun Shen, Xuesong Li, Jiahui Ren, Jing Zhang, Chen Zhao, Shanshan Liu, Haocheng Feng, Jingdong Wang, Yuchao Dai

PDF

Open Access

TL;DR

RnG introduces a unified Transformer model that reconstructs complete 3D structures from partial observations, accurately recovering visible geometry and generating plausible unseen parts for high-fidelity novel view rendering.

Contribution

The paper proposes a novel feed-forward Transformer with a reconstruction-guided causal attention mechanism that unifies 3D reconstruction and generation tasks.

Findings

01

Achieves state-of-the-art results in 3D reconstruction and view synthesis.

02

Effectively generates plausible unseen geometry and appearance.

03

Operates efficiently for real-time applications.

Abstract

Human perceive the 3D world through 2D observations from limited viewpoints. While recent feed-forward generalizable 3D reconstruction models excel at recovering 3D structures from sparse images, their representations are often confined to observed regions, leaving unseen geometry un-modeled. This raises a key, fundamental challenge: Can we infer a complete 3D structure from partial 2D observations? We present RnG (Reconstruction and Generation), a novel feed-forward Transformer that unifies these two tasks by predicting an implicit, complete 3D representation. At the core of RnG, we propose a reconstruction-guided causal attention mechanism that separates reconstruction and generation at the attention level, and treats the KV-cache as an implicit 3D representation. Then, arbitrary poses can efficiently query this cache to render high-fidelity, novel-view RGBD outputs. As a result, RnG…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Topics3D Shape Modeling and Analysis · Advanced Vision and Imaging · Robotics and Sensor-Based Localization