PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment

Gustav Hanning; Kalle {\AA}str\"om; Viktor Larsson

arXiv:2508.04659·cs.CV·August 7, 2025

PixCuboid: Room Layout Estimation from Multi-view Featuremetric Alignment

Gustav Hanning, Kalle {\AA}str\"om, Viktor Larsson

PDF

TL;DR

PixCuboid is an optimization-based multi-view method for estimating cuboid-shaped room layouts, outperforming existing single-view approaches and adaptable to multi-room scenarios, validated on new benchmarks.

Contribution

Introduces PixCuboid, a novel multi-view dense feature alignment approach for room layout estimation, with end-to-end training and new benchmarks for evaluation.

Findings

01

Significantly outperforms existing methods on new benchmarks.

02

Learned feature maps enable robust multi-view alignment.

03

Flexible extension to multi-room layouts demonstrated.

Abstract

Coarse room layout estimation provides important geometric cues for many downstream tasks. Current state-of-the-art methods are predominantly based on single views and often assume panoramic images. We introduce PixCuboid, an optimization-based approach for cuboid-shaped room layout estimation, which is based on multi-view alignment of dense deep features. By training with the optimization end-to-end, we learn feature maps that yield large convergence basins and smooth loss landscapes in the alignment. This allows us to initialize the room layout using simple heuristics. For the evaluation we propose two new benchmarks based on ScanNet++ and 2D-3D-Semantics, with manually verified ground truth 3D cuboids. In thorough experiments we validate our approach and significantly outperform the competition. Finally, while our network is trained with single cuboids, the flexibility of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.