A Unified Framework for 3D Scene Understanding

Wei Xu; Chunsheng Shi; Sifan Tu; Xin Zhou; Dingkang Liang; Xiang Bai

arXiv:2407.03263·cs.CV·November 28, 2024·1 cites

A Unified Framework for 3D Scene Understanding

Wei Xu, Chunsheng Shi, Sifan Tu, Xin Zhou, Dingkang Liang, Xiang Bai

PDF

Open Access 1 Repo 1 Video

TL;DR

UniSeg3D introduces a unified Transformer-based framework that simultaneously handles multiple 3D scene understanding tasks, enabling comprehensive scene analysis and outperforming specialized models across several benchmarks.

Contribution

The paper presents a novel unified model for 3D scene understanding that integrates six tasks into one framework, facilitating inter-task knowledge sharing and improved performance.

Findings

01

Outperforms state-of-the-art methods on ScanNet20, ScanRefer, and ScanNet200.

02

Effectively unifies multiple segmentation tasks within a single model.

03

Utilizes knowledge distillation and contrastive learning for inter-task knowledge transfer.

Abstract

We propose UniSeg3D, a unified 3D scene understanding framework that achieves panoptic, semantic, instance, interactive, referring, and open-vocabulary segmentation tasks within a single model. Most previous 3D segmentation approaches are typically tailored to a specific task, limiting their understanding of 3D scenes to a task-specific perspective. In contrast, the proposed method unifies six tasks into unified representations processed by the same Transformer. It facilitates inter-task knowledge sharing, thereby promoting comprehensive 3D scene understanding. To take advantage of multi-task unification, we enhance performance by establishing explicit inter-task associations. Specifically, we design knowledge distillation and contrastive learning methods to transfer task-specific knowledge across different tasks. Experiments on three benchmarks, including ScanNet20, ScanRefer, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

dk-liang/uniseg3d
pytorchOfficial

Videos

A Unified Framework for 3D Scene Understanding· slideslive

Taxonomy

TopicsAdvanced Vision and Imaging · Image Processing and 3D Reconstruction · 3D Surveying and Cultural Heritage

MethodsAttention Is All You Need · Linear Layer · Multi-Head Attention · Softmax · Layer Normalization · Byte Pair Encoding · Label Smoothing · Position-Wise Feed-Forward Layer · Adam · Dense Connections