TL;DR
FSTM introduces a two-step geometry and semantics learning approach for indoor 3D reconstruction, significantly improving speed and accuracy over existing multi-SDF methods.
Contribution
A streamlined two-step method that separates geometry warm-up from semantic learning, enhancing scalability and performance without complex multi-SDF designs.
Findings
Trains 2.3x faster on Replica dataset.
Improves robustness to real-world imperfections on ScanNet++.
Recovers more object surfaces, increasing recall.
Abstract
Neural Surface Reconstruction has become a standard methodology for indoor 3D reconstruction, with Signed Distance Functions (SDFs) proving particularly effective for representing scene geometry. A variety of applications require a detailed understanding of the scene context, driving the need for object-level semantic signals. While recent methods successfully integrate semantic labels, they often inherit the slow training time and limited scalability of multi-SDF learning. In this paper, we introduce FSTM, a unified approach for learning geometry and semantics through a two-step process: a geometry warm-up using RGB inputs and geometric cues, followed by semantic field estimation. By first optimising geometry without semantic supervision, we observe substantial improvements compared to the standard joint optimisation. Rather than relying on specialised modules or complex multi-SDF…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
