Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

Justin Lazarow; Kai Kang; Afshin Dehghan

arXiv:2505.23756·cs.CV·May 30, 2025

Rooms from Motion: Un-posed Indoor 3D Object Detection as Localization and Mapping

Justin Lazarow, Kai Kang, Afshin Dehghan

PDF

TL;DR

This paper introduces Rooms from Motion, a novel method for 3D object detection and scene mapping from un-posed images, leveraging object-centric matching to estimate camera poses and produce semantic 3D maps.

Contribution

It presents a new object-centric framework that estimates camera poses and creates 3D object maps without requiring known initial poses, improving over existing point-based methods.

Findings

01

Achieves strong localization performance on CA-1M and ScanNet++ datasets.

02

Produces higher quality 3D object maps than leading point-based methods.

03

Extends scene understanding to full scenes with sparse, object-centric representations.

Abstract

We revisit scene-level 3D object detection as the output of an object-centric framework capable of both localization and mapping using 3D oriented boxes as the underlying geometric primitive. While existing 3D object detection approaches operate globally and implicitly rely on the a priori existence of metric camera poses, our method, Rooms from Motion (RfM) operates on a collection of un-posed images. By replacing the standard 2D keypoint-based matcher of structure-from-motion with an object-centric matcher based on image-derived 3D boxes, we estimate metric camera poses, object tracks, and finally produce a global, semantic 3D object map. When a priori pose is available, we can significantly improve map quality through optimization of global 3D boxes against individual observations. RfM shows strong localization performance and subsequently produces maps of higher quality than leading…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.