TL;DR
This paper introduces MitUNet, a hybrid neural network combining Mix-Transformer and U-Net with attention mechanisms, achieving high-precision wall segmentation for 3D indoor space reconstruction.
Contribution
The novel MitUNet architecture effectively combines global context and fine details, improving wall segmentation accuracy over existing methods.
Findings
MitUNet outperforms standard models on CubiCasa5k and regional datasets.
It achieves high boundary accuracy and structural correctness in wall masks.
The approach balances precision and recall using Tversky loss.
Abstract
Automatic 3D reconstruction of indoor spaces from 2D floor plans necessitates high-precision semantic segmentation of structural elements, particularly walls. However, existing methods often struggle with detecting thin structures and maintaining geometric precision. To address this, we introduce MitUNet, a hybrid neural network designed to bridge the gap between global semantic context and fine-grained structural details. Our architecture combines a Mix-Transformer encoder with a U-Net decoder enhanced with spatial and channel attention blocks. Optimized with the Tversky loss function, this approach achieves a balance between precision and recall, ensuring accurate boundary recovery. Experiments on the CubiCasa5k dataset and the regional dataset demonstrate MitUNet's superiority in generating structurally correct masks with high boundary accuracy, outperforming standard models. This…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
