Generalization in Visual Reinforcement Learning with the Reward Sequence Distribution
Jie Wang, Rui Yang, Zijie Geng, Zhihao Shi, Mingxuan Ye, Qi Zhou,, Shuiwang Ji, Bin Li, Yongdong Zhang, and Feng Wu

TL;DR
This paper introduces RSD-OA, a reward sequence distribution method conditioned on actions, which improves generalization in visual reinforcement learning by focusing on task-relevant information and ignoring visual distractions.
Contribution
The paper proposes RSD-OA, a novel reward sequence distribution approach that is invariant to visual distractions and captures long-term task-relevant information for better generalization.
Findings
RSD-OA significantly outperforms state-of-the-art methods on DeepMind Control tasks with visual distractions.
The approach improves generalization to unseen environments in VRL.
RSD-OA effectively isolates task-relevant information from visual noise.
Abstract
Generalization in partially observed markov decision processes (POMDPs) is critical for successful applications of visual reinforcement learning (VRL) in real scenarios. A widely used idea is to learn task-relevant representations that encode task-relevant information of common features in POMDPs, i.e., rewards and transition dynamics. As transition dynamics in the latent state space -- which are task-relevant and invariant to visual distractions -- are unknown to the agents, existing methods alternatively use transition dynamics in the observation space to extract task-relevant information in transition dynamics. However, such transition dynamics in the observation space involve task-irrelevant visual distractions, degrading the generalization performance of VRL methods. To tackle this problem, we propose the reward sequence distribution conditioned on the starting observation and the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Neural dynamics and brain function · Neural Networks and Reservoir Computing
