Enhanced Scene Specificity with Sparse Dynamic Value Estimation

Jaskirat Singh; Liang Zheng

arXiv:2011.12574·cs.LG·November 26, 2020

Enhanced Scene Specificity with Sparse Dynamic Value Estimation

Jaskirat Singh, Liang Zheng

PDF

Open Access

TL;DR

This paper introduces a sparse dynamic value estimation method for multi-scene reinforcement learning, reducing variance and improving performance by enforcing sparse scene-specific value clusters, leading to better rewards and navigation efficiency.

Contribution

It proposes a novel sparse clustering approach for dynamic value estimation in multi-scene RL, enhancing value function accuracy and agent performance.

Findings

01

Significant improvements in final reward scores across ProcGen environments.

02

Increased navigation efficiency in game level completion.

03

Reduced policy gradient variance through sparse clustering.

Abstract

Multi-scene reinforcement learning involves training the RL agent across multiple scenes / levels from the same task, and has become essential for many generalization applications. However, the inclusion of multiple scenes leads to an increase in sample variance for policy gradient computations, often resulting in suboptimal performance with the direct application of traditional methods (e.g. PPO, A3C). One strategy for variance reduction is to consider each scene as a distinct Markov decision process (MDP) and learn a joint value function dependent on both state (s) and MDP (M). However, this is non-trivial as the agent is usually unaware of the underlying level at train / test times in multi-scene RL. Recently, Singh et al. [1] tried to address this by proposing a dynamic value estimation approach that models the true joint value function distribution as a Gaussian mixture model…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Vision and Imaging · Reinforcement Learning in Robotics · Model Reduction and Neural Networks

MethodsEntropy Regularization · Proximal Policy Optimization