SLAG: Scalable Language-Augmented Gaussian Splatting

Laszlo Szilagyi; Francis Engelmann; Jeannette Bohg

arXiv:2505.08124·cs.CV·August 19, 2025

SLAG: Scalable Language-Augmented Gaussian Splatting

Laszlo Szilagyi, Francis Engelmann, Jeannette Bohg

PDF

Open Access

TL;DR

SLAG is a scalable multi-GPU framework that significantly accelerates language-augmented Gaussian splatting for large-scale scene representations, suitable for time-sensitive robotics applications.

Contribution

It introduces a novel parallelized scene encoding method that eliminates the need for per-Gaussian loss functions and incorporates a vector database for efficient embedding management.

Findings

01

Achieves 18x speedup in embedding computation on 16 GPUs

02

Maintains embedding quality on ScanNet and LERF datasets

03

Enables rapid scene encoding for large-scale robotics applications

Abstract

Language-augmented scene representations hold great promise for large-scale robotics applications such as search-and-rescue, smart cities, and mining. Many of these scenarios are time-sensitive, requiring rapid scene encoding while also being data-intensive, necessitating scalable solutions. Deploying these representations on robots with limited computational resources further adds to the challenge. To address this, we introduce SLAG, a multi-GPU framework for language-augmented Gaussian splatting that enhances the speed and scalability of embedding large scenes. Our method integrates 2D visual-language model features into 3D scenes using SAM and CLIP. Unlike prior approaches, SLAG eliminates the need for a loss function to compute per-Gaussian language embeddings. Instead, it derives embeddings from 3D Gaussian scene parameters via a normalized weighted average, enabling highly…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsRobotics and Automated Systems

MethodsSegment Anything Model · SPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings · Contrastive Language-Image Pre-training