ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

Anthony Liang; Jesse Thomason; Erdem B{\i}y{\i}k

arXiv:2403.10940·cs.RO·October 22, 2024·1 cites

ViSaRL: Visual Reinforcement Learning Guided by Human Saliency

Anthony Liang, Jesse Thomason, Erdem B{\i}y{\i}k

PDF

Open Access

TL;DR

ViSaRL leverages human-like visual saliency to guide reinforcement learning, significantly enhancing sample efficiency, success rates, and robustness in robotic control tasks from pixel inputs.

Contribution

Introduces ViSaRL, a novel method integrating visual saliency into RL for improved learning efficiency and robustness across simulated and real robotic tasks.

Findings

01

Nearly doubles success rate on real-robot tasks.

02

Improves sample efficiency and generalization.

03

Robust to visual perturbations.

Abstract

Training robots to perform complex control tasks from high-dimensional pixel input using reinforcement learning (RL) is sample-inefficient, because image observations are comprised primarily of task-irrelevant information. By contrast, humans are able to visually attend to task-relevant objects and areas. Based on this insight, we introduce Visual Saliency-Guided Reinforcement Learning (ViSaRL). Using ViSaRL to learn visual representations significantly improves the success rate, sample efficiency, and generalization of an RL agent on diverse tasks including DeepMind Control benchmark, robot manipulation in simulation and on a real robot. We present approaches for incorporating saliency into both CNN and Transformer-based encoders. We show that visual representations learned using ViSaRL are robust to various sources of visual perturbations including perceptual noise and scene…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsVisual Attention and Saliency Detection