TL;DR
This paper introduces Laser, a novel method for efficient 3D scene segmentation guided by language, which distills dense CLIP features and employs innovative modules to improve accuracy, speed, and consistency in neural radiance fields.
Contribution
Laser presents a streamlined approach to language-guided 3D segmentation by directly distilling dense CLIP features and introducing modules for noise reduction, edge accuracy, and viewpoint consistency.
Findings
Outperforms state-of-the-art in speed and accuracy
Achieves precise 3D segmentation with reduced computational resources
Enhances segmentation consistency across viewpoints
Abstract
In this work, we propose a method that leverages CLIP feature distillation, achieving efficient 3D segmentation through language guidance. Unlike previous methods that rely on multi-scale CLIP features and are limited by processing speed and storage requirements, our approach aims to streamline the workflow by directly and effectively distilling dense CLIP features, thereby achieving precise segmentation of 3D scenes using text. To achieve this, we introduce an adapter module and mitigate the noise issue in the dense CLIP feature distillation process through a self-cross-training strategy. Moreover, to enhance the accuracy of segmentation edges, this work presents a low-rank transient query attention mechanism. To ensure the consistency of segmentation for similar colors under different viewpoints, we convert the segmentation task into a classification task through label volume, which…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Adapter
