Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies
Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio, Savarese, Jitendra Malik

TL;DR
Integrating mid-level visual perception skills into reinforcement learning significantly enhances generalization and sample efficiency in visuomotor tasks, provided the perceptual features are carefully selected, outperforming end-to-end training and other feature methods.
Contribution
This work demonstrates that mid-level perceptual skills improve reinforcement learning for visuomotor tasks and introduces an efficient feature set that replaces raw images for better performance.
Findings
Mid-level perception improves task generalization.
Sample efficiency increases with perceptual priors.
Careful selection of perceptual skills is crucial.
Abstract
How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework--see Figure 1. This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images. We find that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks. Agents are able to generalize to situations where the from-scratch approach fails and training becomes significantly more sample efficient. However, we show that realizing these gains requires careful selection of the mid-level perceptual skills.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsVisual perception and processing mechanisms · Motor Control and Adaptation · Advanced Vision and Imaging
