Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

Binh Long Nguyen; Kien Nguyen; Sridha Sridharan; Clinton Fookes; Peyman Moghadam

arXiv:2605.04506·cs.CV·May 14, 2026

Ilov3Splat: Instance-Level Open-Vocabulary 3D Scene Understanding in Gaussian Splatting

Binh Long Nguyen, Kien Nguyen, Sridha Sridharan, Clinton Fookes, Peyman Moghadam

PDF

1 Repo

TL;DR

Ilov3Splat is a new framework for open-vocabulary 3D scene understanding that combines Gaussian splatting with language-aligned features for instance-level recognition without manual annotations.

Contribution

It introduces a method that jointly optimizes scene geometry and semantic features using CLIP and SAM, enabling accurate language-driven 3D object detection and segmentation.

Findings

01

Outperforms prior methods in object selection and instance segmentation

02

Supports arbitrary object recognition based on natural language

03

Operates without category supervision or manual annotations

Abstract

We introduce Ilov3Splat, a novel framework for instance-level open-vocabulary 3D scene understanding built on 3D Gaussian Splatting (3D-GS). Most prior work depends on 2D rendering-based matching or point-level semantic association, which undermines cross-view consistency, lacks coherent instance-level reasoning, and limits precision in downstream 3D tasks. To address these limitations, our method jointly optimizes scene geometry and semantic representations by augmenting Gaussian splats with view-consistent feature fields. Specifically, we leverage multi-resolution hash embedding to efficiently encode language-aligned CLIP features, enabling dense and coherent language grounding in 3D space. We further train an instance feature field using contrastive loss over SAM masks, supporting fine-grained object distinction across views. At inference time, CLIP-encoded queries are matched…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://csiro-robotics.github.io/Ilov3Splat
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.