Environment Predictive Coding for Embodied Agents

Santhosh K. Ramakrishnan; Tushar Nagarajan; Ziad Al-Halah; Kristen; Grauman

arXiv:2102.02337·cs.CV·February 5, 2021·5 cites

Environment Predictive Coding for Embodied Agents

Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen, Grauman

PDF

Open Access

TL;DR

This paper presents environment predictive coding, a self-supervised method for embodied agents to learn environment representations from videos, improving navigation tasks in 3D environments by predicting masked trajectory segments.

Contribution

Introduces environment predictive coding, a novel self-supervised learning approach that encodes environment-level representations from agent trajectories for improved navigation.

Findings

01

Outperforms state-of-the-art on Gibson and Matterport3D environments

02

Effective transfer to multiple navigation tasks

03

Requires limited experience for strong performance

Abstract

We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of images gathered by an agent as it moves about in 3D environments. We learn these representations via a zone prediction task, where we intelligently mask out portions of an agent's trajectory and predict them from the unmasked portions, conditioned on the agent's camera poses. By learning such representations on a collection of videos, we demonstrate successful transfer to multiple downstream navigation-oriented tasks. Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics