Sim-To-Real Transfer of Visual Grounding for Human-Aided Ambiguity Resolution
Georgios Tziafas, Hamidreza Kasaei

TL;DR
This paper introduces a modular, decoupled framework for visual grounding that leverages synthetic scene graph annotations, enabling effective Sim-To-Real transfer for robotic understanding of natural language instructions.
Contribution
It proposes a fully decoupled modular approach for compositional visual grounding, trained independently on synthetic data, and adaptable for Sim-To-Real transfer in robotic applications.
Findings
Effective in simulation and real RGB-D datasets
Facilitates domain adaptation for visual recognition
Offers a data-efficient, robust, and interpretable solution
Abstract
Service robots should be able to interact naturally with non-expert human users, not only to help them in various tasks but also to receive guidance in order to resolve ambiguities that might be present in the instruction. We consider the task of visual grounding, where the agent segments an object from a crowded scene given a natural language description. Modern holistic approaches to visual grounding usually ignore language structure and struggle to cover generic domains, therefore relying heavily on large datasets. Additionally, their transfer performance in RGB-D datasets suffers due to high visual discrepancy between the benchmark and the target domains. Modular approaches marry learning with domain modeling and exploit the compositional nature of language to decouple visual representation from language parsing, but either rely on external parsers or are trained in an end-to-end…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMultimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning · Advanced Image and Video Retrieval Techniques
