Interactive Visual Grounding of Referring Expressions for Human-Robot   Interaction

Mohit Shridhar; David Hsu

arXiv:1806.03831·cs.RO·June 12, 2018

Interactive Visual Grounding of Referring Expressions for Human-Robot Interaction

Mohit Shridhar, David Hsu

PDF

TL;DR

This paper introduces INGRESS, a robot system that interprets natural language instructions to identify and manipulate objects, using a novel two-stage neural network for grounding referring expressions and interactive disambiguation.

Contribution

The paper proposes a two-stage neural network model for grounding referring expressions and integrates interactive question asking, enabling flexible human-robot communication.

Findings

01

OUTPERFORMS state-of-the-art on RefCOCO dataset

02

SUCCESSFULLY applied in robot experiments with humans

03

Handles unconstrained object categories and language expressions

Abstract

This paper presents INGRESS, a robot system that follows human natural language instructions to pick and place everyday objects. The core issue here is the grounding of referring expressions: infer objects and their relationships from input images and language expressions. INGRESS allows for unconstrained object categories and unconstrained language expressions. Further, it asks questions to disambiguate referring expressions interactively. To achieve these, we take the approach of grounding by generation and propose a two-stage neural network model for grounding. The first stage uses a neural network to generate visual descriptions of objects, compares them with the input language expression, and identifies a set of candidate objects. The second stage uses another neural network to examine all pairwise relations between the candidates and infers the most likely referred object. The…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.