RREx-BoT: Remote Referring Expressions with a Bag of Tricks
Gunnar A. Sigurdsson, Jesse Thomason, Gaurav S. Sukhatme, Robinson, Piramuthu

TL;DR
This paper introduces RREx-BoT, a simple yet effective approach for remote object grounding in household robots, leveraging a generic vision-language model with 3D encoding and tricks to handle large search spaces, achieving state-of-the-art results.
Contribution
The paper demonstrates that a generic vision-language scoring model, combined with specific tricks like 3D encoding, can significantly improve remote object grounding in embodied environments.
Findings
9.84% performance gain on REVERIE
5.04% performance gain on FAO
Effective real-world deployment on TurtleBot
Abstract
Household robots operate in the same space for years. Such robots incrementally build dynamic maps that can be used for tasks requiring remote object localization. However, benchmarks in robot learning often test generalization through inference on tasks in unobserved environments. In an observed environment, locating an object is reduced to choosing from among all object proposals in the environment, which may number in the 100,000s. Armed with this intuition, using only a generic vision-language scoring model with minor modifications for 3d encoding and operating in an embodied environment, we demonstrate an absolute performance gain of 9.84% on remote object grounding above state of the art models for REVERIE and of 5.04% on FAO. When allowed to pre-explore an environment, we also exceed the previous state of the art pre-exploration method on REVERIE. Additionally, we demonstrate our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
MethodsTest
