UniScale: Unified Scale-Aware 3D Reconstruction for Multi-View Understanding via Prior Injection for Robotic Perception

Mohammad Mahdavian; Gordon Tan; Binbin Xu; Yuan Ren; Dongfeng Bai; Bingbing Liu

arXiv:2602.23224·cs.CV·February 27, 2026

UniScale: Unified Scale-Aware 3D Reconstruction for Multi-View Understanding via Prior Injection for Robotic Perception

Mohammad Mahdavian, Gordon Tan, Binbin Xu, Yuan Ren, Dongfeng Bai, Bingbing Liu

PDF

Open Access

TL;DR

UniScale is a unified, scale-aware 3D reconstruction framework for robotic perception that integrates geometric priors and multi-view information to accurately recover scene structure and scale in a single, resource-efficient model.

Contribution

It introduces a modular, semantically informed network that jointly estimates camera parameters, depth, point maps, and scene scale, leveraging priors without training from scratch.

Findings

01

Strong generalization across diverse environments

02

Improved accuracy with known camera intrinsics and poses

03

Effective integration of geometric priors in a unified model

Abstract

We present UniScale, a unified, scale-aware multi-view 3D reconstruction framework for robotic applications that flexibly integrates geometric priors through a modular, semantically informed design. In vision-based robotic navigation, the accurate extraction of environmental structure from raw image sequences is critical for downstream tasks. UniScale addresses this challenge with a single feed-forward network that jointly estimates camera intrinsics and extrinsics, scale-invariant depth and point maps, and the metric scale of a scene from multi-view images, while optionally incorporating auxiliary geometric priors when available. By combining global contextual reasoning with camera-aware feature representations, UniScale is able to recover the metric-scale of the scene. In robotic settings where camera intrinsics are known, they can be easily incorporated to improve performance, with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Robotics and Sensor-Based Localization · Robot Manipulation and Learning