Query3D: LLM-Powered Open-Vocabulary Scene Segmentation with Language Embedded 3D Gaussian
Amirhosein Chahe, Lifeng Zhou

TL;DR
This paper presents Query3D, a novel open-vocabulary 3D scene segmentation method that uses language-embedded Gaussian representations and large language models to improve autonomous driving scene understanding.
Contribution
It introduces a new approach combining language-embedded 3D Gaussians with LLMs for enhanced scene segmentation and demonstrates effective fine-tuning for on-device deployment.
Findings
LLM-guided segmentation outperforms traditional methods.
Smaller fine-tuned models achieve comparable performance to larger models.
Larger models better utilize semantic information from helping positive words.
Abstract
This paper introduces a novel method for open-vocabulary 3D scene querying in autonomous driving by combining Language Embedded 3D Gaussians with Large Language Models (LLMs). We propose utilizing LLMs to generate both contextually canonical phrases and helping positive words for enhanced segmentation and scene interpretation. Our method leverages GPT-3.5 Turbo as an expert model to create a high-quality text dataset, which we then use to fine-tune smaller, more efficient LLMs for on-device deployment. Our comprehensive evaluation on the WayveScenes101 dataset demonstrates that LLM-guided segmentation significantly outperforms traditional approaches based on predefined canonical phrases. Notably, our fine-tuned smaller models achieve performance comparable to larger expert models while maintaining faster inference times. Through ablation studies, we discover that the effectiveness of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Advanced Neural Network Applications
MethodsRefunds@Expedia|||How do I get a full refund from Expedia? · 15 Ways to Contact How can i speak to someone at Delta Airlines · Cosine Annealing · Residual Connection · Linear Layer · Linear Warmup With Cosine Annealing · Weight Decay · Softmax · Attention Dropout · Dropout
