LatentAM: Real-Time, Large-Scale Latent Gaussian Attention Mapping via Online Dictionary Learning
Junwoon Lee, Yulun Tian

TL;DR
LatentAM is an online, scalable 3D Gaussian mapping framework that uses a model-agnostic, dictionary learning approach to generate latent feature maps from streaming RGB-D data, enabling open-vocabulary perception.
Contribution
It introduces a novel online dictionary learning method for 3D Gaussian mapping that is model-agnostic, pretraining-free, and scalable to large environments with efficient map management.
Findings
Achieves significantly better feature reconstruction fidelity than state-of-the-art methods.
Operates at near-real-time speeds of 12-35 FPS on large-scale datasets.
Successfully integrates with various VLMs without model-specific decoders.
Abstract
We present LatentAM, an online 3D Gaussian Splatting (3DGS) mapping framework that builds scalable latent feature maps from streaming RGB-D observations for open-vocabulary robotic perception. Instead of distilling high-dimensional Vision-Language Model (VLM) embeddings using model-specific decoders, LatentAM proposes an online dictionary learning approach that is both model-agnostic and pretraining-free, enabling plug-and-play integration with different VLMs at test time. Specifically, our approach associates each Gaussian primitive with a compact query vector that can be converted into approximate VLM embeddings using an attention mechanism with a learnable dictionary. The dictionary is initialized efficiently from streaming observations and optimized online to adapt to evolving scene semantics under trust-region regularization. To scale to long trajectories and large environments, we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Advanced Neural Network Applications · Robotics and Sensor-Based Localization
