The Bare Necessities: Designing Simple, Effective Open-Vocabulary Scene Graphs
Christina Kassab, Mat\'ias Mattamala, Sacha Morin, Martin B\"uchner,, Abhinav Valada, Liam Paull, Maurice Fallon

TL;DR
This paper critically evaluates design choices in open-vocabulary scene graph methods, proposing a simple, efficient framework that maintains high performance while significantly reducing computational costs.
Contribution
It introduces a general scene graph framework and identifies key strategies for balancing efficiency and accuracy in 3D scene understanding.
Findings
Image pre-processing offers minimal performance gains but increases computation.
Averaging features across views degrades performance.
Alternative feature selection improves efficiency without sacrificing accuracy.
Abstract
3D open-vocabulary scene graph methods are a promising map representation for embodied agents, however many current approaches are computationally expensive. In this paper, we reexamine the critical design choices established in previous works to optimize both efficiency and performance. We propose a general scene graph framework and conduct three studies that focus on image pre-processing, feature fusion, and feature selection. Our findings reveal that commonly used image pre-processing techniques provide minimal performance improvement while tripling computation (on a per object view basis). We also show that averaging feature labels across different views significantly degrades performance. We study alternative feature selection strategies that enhance performance without adding unnecessary computational costs. Based on our findings, we introduce a computationally balanced approach…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Natural Language Processing Techniques · Topic Modeling
MethodsFeature Selection · Focus
