Learning to Act by Predicting the Future
Alexey Dosovitskiy, Vladlen Koltun

TL;DR
This paper introduces a sensorimotor control method that learns to act in complex 3D environments by predicting future sensory states, enabling goal-agnostic and adaptable behavior without extraneous supervision.
Contribution
It presents a novel supervised learning approach that leverages cotemporal sensory streams for training control models without fixed goals or additional supervision.
Findings
Outperforms prior methods on challenging tasks
Successfully generalizes across environments and goals
Won the Visual Doom AI Competition in unseen environments
Abstract
We present an approach to sensorimotor control in immersive environments. Our approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream. The cotemporal structure of these streams provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment. The model is trained using supervised learning techniques, but without extraneous supervision. It learns to act based on raw sensory input from a complex three-dimensional environment. The presented formulation enables learning without a fixed goal at training time, and pursuing dynamically changing goals at test time. We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function
