ChatSplat: 3D Conversational Gaussian Splatting
Hanlin Chen, Fangyin Wei, Gim Hee Lee

TL;DR
ChatSplat introduces a novel 3D language field system that enables multi-level chat-based interactions within 3D environments, integrating object, view, and scene understanding through advanced encoding and normalization techniques.
Contribution
The paper presents ChatSplat, a system that constructs a 3D language field supporting multi-level interactions, with novel encoding strategies and normalization for effective language embedding learning.
Findings
Supports multi-level interaction in 3D space
Uses a learnable normalization for language embeddings
Demonstrates improved scene understanding and engagement
Abstract
Humans naturally interact with their 3D surroundings using language, and modeling 3D language fields for scene understanding and interaction has gained growing interest. This paper introduces ChatSplat, a system that constructs a 3D language field, enabling rich chat-based interaction within 3D space. Unlike existing methods that primarily use CLIP-derived language features focused solely on segmentation, ChatSplat facilitates interaction on three levels: objects, views, and the entire 3D scene. For view-level interaction, we designed an encoder that encodes the rendered feature map of each view into tokens, which are then processed by a large language model (LLM) for conversation. At the scene level, ChatSplat combines multi-view tokens, enabling interactions that consider the entire scene. For object-level interaction, ChatSplat uses a patch-wise language embedding, unlike LangSplat's…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMisinformation and Its Impacts · Recommender Systems and Techniques
