ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

Lingjun Zhao; Yandong Luo; James Hays; Lu Gan

arXiv:2512.03370·cs.CV·April 13, 2026

ShelfGaussian: Shelf-Supervised Open-Vocabulary Gaussian-based 3D Scene Understanding

Lingjun Zhao, Yandong Luo, James Hays, Lu Gan

PDF

1 Repo

TL;DR

ShelfGaussian is a novel 3D scene understanding framework that leverages off-the-shelf vision foundation models to enable open-vocabulary, multi-modal Gaussian representations for improved perception and planning in diverse environments.

Contribution

It introduces a Multi-Modal Gaussian Transformer and a Shelf-Supervised Learning Paradigm to optimize Gaussian representations across multiple sensor modalities and scene levels.

Findings

01

Achieves state-of-the-art zero-shot semantic occupancy prediction on Occ3D-nuScenes.

02

Demonstrates effective in-the-wild performance on urban scenarios with UGVs.

Abstract

We introduce ShelfGaussian, an open-vocabulary multi-modal Gaussian-based 3D scene understanding framework supervised by off-the-shelf vision foundation models (VFMs). Gaussian-based methods have demonstrated superior performance and computational efficiency across a wide range of scene understanding tasks. However, existing methods either model objects as closed-set semantic Gaussians supervised by annotated 3D labels, neglecting their rendering ability, or learn open-set Gaussian representations via purely 2D self-supervision, leading to degraded geometry and limited to camera-only settings. To fully exploit the potential of Gaussians, we propose a Multi-Modal Gaussian Transformer that enables Gaussians to query features from diverse sensor modalities, and a Shelf-Supervised Learning Paradigm that efficiently optimizes Gaussians with VFM features jointly at 2D image and 3D scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://lunarlab-gatech.github.io/ShelfGaussian
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.