Few-Shot Visual Grounding for Natural Human-Robot Interaction

Giorgos Tziafas; Hamidreza Kasaei

arXiv:2103.09720·cs.CV·April 1, 2021

Few-Shot Visual Grounding for Natural Human-Robot Interaction

Giorgos Tziafas, Hamidreza Kasaei

PDF

TL;DR

This paper introduces a novel single-stage zero-shot deep neural network for visual grounding in human-robot interaction, enabling robots to understand verbal references to objects in crowded scenes without prior training on specific objects.

Contribution

The paper presents a new single-stage zero-shot visual grounding model that outperforms traditional methods relying on pre-trained detectors, enhancing real-time understanding in dynamic environments.

Findings

01

High accuracy and speed in real RGB-D data

02

Robustness to natural language variation

03

Effective in crowded scenes

Abstract

Natural Human-Robot Interaction (HRI) is one of the key components for service robots to be able to work in human-centric environments. In such dynamic environments, the robot needs to understand the intention of the user to accomplish a task successfully. Towards addressing this point, we propose a software architecture that segments a target object from a crowded scene, indicated verbally by a human user. At the core of our system, we employ a multi-modal deep neural network for visual grounding. Unlike most grounding methods that tackle the challenge using pre-trained object detectors via a two-stepped process, we develop a single stage zero-shot model that is able to provide predictions in unseen data. We evaluate the performance of the proposed model on real RGB-D data collected from public scene datasets. Experimental results showed that the proposed model performs well in terms…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

Methodstravel james