AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

Tianling Xu; Shengzhe Gan; Leslie Gu; Yuelei Li; Fangneng Zhan; and Hanspeter Pfister

arXiv:2512.05131·cs.CV·December 8, 2025

AREA3D: Active Reconstruction Agent with Unified Feed-Forward 3D Perception and Vision-Language Guidance

Tianling Xu, Shengzhe Gan, Leslie Gu, Yuelei Li, Fangneng Zhan, and Hanspeter Pfister

PDF

Open Access

TL;DR

AREA3D introduces an active 3D reconstruction agent that combines feed-forward models and vision-language guidance to improve scene reconstruction accuracy efficiently, especially with limited views.

Contribution

The paper presents a novel active reconstruction framework that decouples uncertainty estimation from the reconstructor and integrates semantic guidance for better viewpoint selection.

Findings

01

Achieves state-of-the-art accuracy in sparse-view scenarios

02

Effectively estimates view uncertainty without online optimization

03

Utilizes vision-language guidance for diverse and informative viewpoints

Abstract

Active 3D reconstruction enables an agent to autonomously select viewpoints to efficiently obtain accurate and complete scene geometry, rather than passively reconstructing scenes from pre-collected images. However, existing active reconstruction methods often rely on hand-crafted geometric heuristics, which can lead to redundant observations without substantially improving reconstruction quality. To address this limitation, we propose AREA3D, an active reconstruction agent that leverages feed-forward 3D reconstruction models and vision-language guidance. Our framework decouples view-uncertainty modeling from the underlying feed-forward reconstructor, enabling precise uncertainty estimation without expensive online optimization. In addition, an integrated vision-language model provides high-level semantic guidance, encouraging informative and diverse viewpoints beyond purely geometric…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · 3D Shape Modeling and Analysis