TL;DR
OpenSGA introduces a unified, efficient framework for 3D scene graph alignment that fuses vision-language, textual, and geometric features, significantly improving accuracy in open-world scenarios.
Contribution
The paper presents a novel scene graph alignment method combining multiple features and introduces a large-scale dataset, ScanNet-SG, for training and evaluation.
Findings
Achieves state-of-the-art performance on F2S and S2S tasks.
Outperforms existing scene graph alignment methods.
Successfully handles large coordinate discrepancies.
Abstract
Scene graph alignment establishes object correspondences between two 3D scene graphs constructed from partially overlapping observations. This enables efficient scene understanding and object-level relocalization when a robot revisits a place, as well as global map fusion across multiple agents. Such capabilities are essential for robots that require long-term memory for long-horizon tasks involving interactions with the environment. Existing approaches mainly focus on subscan-to-subscan (S2S) alignment and depend heavily on geometric point-cloud features, leaving frame-to-scan (F2S) alignment and open-set vision-language features underexplored. In addition, existing datasets for scene graph alignment remain small-scale with limited object diversity, constraining systematic training and evaluation. We present a unified and efficient scene graph alignment framework that predicts object…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
