Environment Predictive Coding for Embodied Agents
Santhosh K. Ramakrishnan, Tushar Nagarajan, Ziad Al-Halah, Kristen, Grauman

TL;DR
This paper presents environment predictive coding, a self-supervised method for embodied agents to learn environment representations from videos, improving navigation tasks in 3D environments by predicting masked trajectory segments.
Contribution
Introduces environment predictive coding, a novel self-supervised learning approach that encodes environment-level representations from agent trajectories for improved navigation.
Findings
Outperforms state-of-the-art on Gibson and Matterport3D environments
Effective transfer to multiple navigation tasks
Requires limited experience for strong performance
Abstract
We introduce environment predictive coding, a self-supervised approach to learn environment-level representations for embodied agents. In contrast to prior work on self-supervised learning for images, we aim to jointly encode a series of images gathered by an agent as it moves about in 3D environments. We learn these representations via a zone prediction task, where we intelligently mask out portions of an agent's trajectory and predict them from the unmasked portions, conditioned on the agent's camera poses. By learning such representations on a collection of videos, we demonstrate successful transfer to multiple downstream navigation-oriented tasks. Our experiments on the photorealistic 3D environments of Gibson and Matterport3D show that our method outperforms the state-of-the-art on challenging tasks with only a limited budget of experience.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Multimodal Machine Learning Applications · Reinforcement Learning in Robotics
