Following Instructions by Imagining and Reaching Visual Goals

John Kanu; Eadom Dessalene; Xiaomin Lin; Cornelia Fermuller; Yiannis; Aloimonos

arXiv:2001.09373·cs.LG·January 28, 2020·5 cites

Following Instructions by Imagining and Reaching Visual Goals

John Kanu, Eadom Dessalene, Xiaomin Lin, Cornelia Fermuller, Yiannis, Aloimonos

PDF

Open Access

TL;DR

This paper introduces a novel reinforcement learning framework that enables robots to perform complex, temporally extended tasks by imagining visual goals and reasoning spatially, all from raw pixel inputs without prior knowledge.

Contribution

The work presents a new approach for learning to follow instructions through visual goal imagination and spatial reasoning in RL, operating directly on raw images without prior linguistic or perceptual data.

Findings

01

Outperforms flat architectures with raw pixels and ground-truth states.

02

Outperforms hierarchical architectures with ground-truth states on object arrangement tasks.

03

Effective in simulated 3D environments for robotic tasks.

Abstract

While traditional methods for instruction-following typically assume prior linguistic and perceptual knowledge, many recent works in reinforcement learning (RL) have proposed learning policies end-to-end, typically by training neural networks to map joint representations of observations and instructions directly to actions. In this work, we present a novel framework for learning to perform temporally extended tasks using spatial reasoning in the RL framework, by sequentially imagining visual goals and choosing appropriate actions to fulfill imagined goals. Our framework operates on raw pixel images, assumes no prior linguistic or perceptual knowledge, and learns via intrinsic motivation and a single extrinsic reward signal measuring task completion. We validate our method in two environments with a robot arm in a simulated interactive 3D environment. Our method outperforms two flat…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection · Multimodal Machine Learning Applications · Advanced Vision and Imaging