LOST-3DSG: Lightweight Open-Vocabulary 3D Scene Graphs with Semantic Tracking in Dynamic Environments

Sara Micol Ferraina; Michele Brienza; Francesco Argenziano; Emanuele Musumeci; Vincenzo Suriani; Domenico D. Bloisi; Daniele Nardi

arXiv:2601.02905·cs.RO·January 13, 2026

LOST-3DSG: Lightweight Open-Vocabulary 3D Scene Graphs with Semantic Tracking in Dynamic Environments

Sara Micol Ferraina, Michele Brienza, Francesco Argenziano, Emanuele Musumeci, Vincenzo Suriani, Domenico D. Bloisi, Daniele Nardi

PDF

Open Access

TL;DR

LOST-3DSG introduces a lightweight, open-vocabulary 3D scene graph framework that efficiently tracks dynamic objects in real-world environments using semantic embeddings, outperforming heavy model-based approaches.

Contribution

The paper presents LOST-3DSG, a novel semantic tracking method that avoids dense visual features, enabling efficient real-time 3D scene graph construction in dynamic environments.

Findings

01

Outperforms existing methods in dynamic object tracking accuracy

02

Operates efficiently without dense visual feature storage

03

Validated through real-world robot experiments

Abstract

Tracking objects that move within dynamic environments is a core challenge in robotics. Recent research has advanced this topic significantly; however, many existing approaches remain inefficient due to their reliance on heavy foundation models. To address this limitation, we propose LOST-3DSG, a lightweight open-vocabulary 3D scene graph designed to track dynamic objects in real-world environments. Our method adopts a semantic approach to entity tracking based on word2vec and sentence embeddings, enabling an open-vocabulary representation while avoiding the necessity of storing dense CLIP visual features. As a result, LOST-3DSG achieves superior performance compared to approaches that rely on high-dimensional visual embeddings. We evaluate our method through qualitative and quantitative experiments conducted in a real 3D environment using a TIAGo robot. The results demonstrate the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMultimodal Machine Learning Applications · Human Pose and Action Recognition · Social Robot Interaction and HRI