Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

Xingyu Miao; Haoran Duan; Quanhao Qian; Jiuniu Wang; Yang Long; Ling Shao; Deli Zhao; Ran Xu; Gongjie Zhang

arXiv:2507.18678·cs.CV·July 28, 2025

Towards Scalable Spatial Intelligence via 2D-to-3D Data Lifting

Xingyu Miao, Haoran Duan, Quanhao Qian, Jiuniu Wang, Yang Long, Ling Shao, Deli Zhao, Ran Xu, Gongjie Zhang

PDF

TL;DR

This paper introduces a scalable pipeline that converts single-view images into realistic 3D representations, significantly reducing data collection costs and enhancing spatial scene understanding for AI systems.

Contribution

The authors present a novel method for automatic, scale-aware 3D data generation from 2D images, bridging the gap between abundant imagery and 3D spatial intelligence needs.

Findings

01

Generated datasets improve 3D perception tasks

02

Method reduces data collection costs

03

Enhanced spatial reasoning in AI systems

Abstract

Spatial intelligence is emerging as a transformative frontier in AI, yet it remains constrained by the scarcity of large-scale 3D datasets. Unlike the abundant 2D imagery, acquiring 3D data typically requires specialized sensors and laborious annotation. In this work, we present a scalable pipeline that converts single-view images into comprehensive, scale- and appearance-realistic 3D representations - including point clouds, camera poses, depth maps, and pseudo-RGBD - via integrated depth estimation, camera calibration, and scale calibration. Our method bridges the gap between the vast repository of imagery and the increasing demand for spatial scene understanding. By automatically generating authentic, scale-aware 3D data from images, we significantly reduce data collection costs and open new avenues for advancing spatial intelligence. We release two generated spatial datasets, i.e.,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.