Mid-Level Visual Representations Improve Generalization and Sample   Efficiency for Learning Visuomotor Policies

Alexander Sax; Bradley Emi; Amir R. Zamir; Leonidas Guibas; Silvio; Savarese; Jitendra Malik

arXiv:1812.11971·cs.CV·April 23, 2019·49 cites

Mid-Level Visual Representations Improve Generalization and Sample Efficiency for Learning Visuomotor Policies

Alexander Sax, Bradley Emi, Amir R. Zamir, Leonidas Guibas, Silvio, Savarese, Jitendra Malik

PDF

Open Access 1 Repo

TL;DR

Integrating mid-level visual perception skills into reinforcement learning significantly enhances generalization and sample efficiency in visuomotor tasks, provided the perceptual features are carefully selected, outperforming end-to-end training and other feature methods.

Contribution

This work demonstrates that mid-level perceptual skills improve reinforcement learning for visuomotor tasks and introduces an efficient feature set that replaces raw images for better performance.

Findings

01

Mid-level perception improves task generalization.

02

Sample efficiency increases with perceptual priors.

03

Careful selection of perceptual skills is crucial.

Abstract

How much does having visual priors about the world (e.g. the fact that the world is 3D) assist in learning to perform downstream motor tasks (e.g. delivering a package)? We study this question by integrating a generic perceptual skill set (e.g. a distance estimator, an edge detector, etc.) within a reinforcement learning framework--see Figure 1. This skill set (hereafter mid-level perception) provides the policy with a more processed state of the world compared to raw images. We find that using a mid-level perception confers significant advantages over training end-to-end from scratch (i.e. not leveraging priors) in navigation-oriented tasks. Agents are able to generalize to situations where the from-scratch approach fails and training becomes significantly more sample efficient. However, we show that realizing these gains requires careful selection of the mid-level perceptual skills.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

alexsax/midlevel-reps
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual perception and processing mechanisms · Motor Control and Adaptation · Advanced Vision and Imaging