Generalization in Visual Reinforcement Learning with the Reward Sequence   Distribution

Jie Wang; Rui Yang; Zijie Geng; Zhihao Shi; Mingxuan Ye; Qi Zhou,; Shuiwang Ji; Bin Li; Yongdong Zhang; and Feng Wu

arXiv:2302.09601·cs.LG·February 21, 2023

Generalization in Visual Reinforcement Learning with the Reward Sequence Distribution

Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou,, Shuiwang Ji, Bin Li, Yongdong Zhang, and Feng Wu

PDF

Open Access 1 Repo

TL;DR

This paper introduces RSD-OA, a reward sequence distribution method conditioned on actions, which improves generalization in visual reinforcement learning by focusing on task-relevant information and ignoring visual distractions.

Contribution

The paper proposes RSD-OA, a novel reward sequence distribution approach that is invariant to visual distractions and captures long-term task-relevant information for better generalization.

Findings

01

RSD-OA significantly outperforms state-of-the-art methods on DeepMind Control tasks with visual distractions.

02

The approach improves generalization to unseen environments in VRL.

03

RSD-OA effectively isolates task-relevant information from visual noise.

Abstract

Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL) in real scenarios. A widely used idea is to learn task-relevant representations that encode task-relevant information of common features in POMDPs, i.e., rewards and transition dynamics. As transition dynamics in the latent state space -- which are task-relevant and invariant to visual distractions -- are unknown to the agents, existing methods alternatively use transition dynamics in the observation space to extract task-relevant information in transition dynamics. However, such transition dynamics in the observation space involve task-irrelevant visual distractions, degrading the generalization performance of VRL methods. To tackle this problem, we propose the reward sequence distribution conditioned on the starting observation and the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

miralab-ustc/rl-cresp
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Neural Networks and Reservoir Computing