A Unified Framework for 3D Scene Understanding
Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

TL;DR
UniSeg3D introduces a unified Transformer-based framework that simultaneously handles multiple 3D scene understanding tasks, enabling comprehensive scene analysis and outperforming specialized models across several benchmarks.
Contribution
The paper presents a novel unified model for 3D scene understanding that integrates six tasks into one framework, facilitating inter-task knowledge sharing and improved performance.
Findings
Outperforms state-of-the-art methods on ScanNet20, ScanRefer, and ScanNet200.
Effectively unifies multiple segmentation tasks within a single model.
Utilizes knowledge distillation and contrastive learning for inter-task knowledge transfer.
Abstract
We propose UniSeg3D, a unified 3D scene understanding framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary segmentation tasks within a single model. Most previous 3D segmentation approaches are typically tailored to a specific task, limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing, thereby promoting comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance performance by establishing explicit inter-task associations. Specifically, we design knowledge distillation and contrastive learning methods to transfer task-specific knowledge across different tasks. Experiments on three benchmarks, including ScanNet20, ScanRefer, and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdvanced Vision and Imaging · Image Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Dense Connections
