Enhanced Scene Specificity with Sparse Dynamic Value Estimation
Jaskirat Singh, Liang Zheng

TL;DR
This paper introduces a sparse dynamic value estimation method for multi-scene reinforcement learning, reducing variance and improving performance by enforcing sparse scene-specific value clusters, leading to better rewards and navigation efficiency.
Contribution
It proposes a novel sparse clustering approach for dynamic value estimation in multi-scene RL, enhancing value function accuracy and agent performance.
Findings
Significant improvements in final reward scores across ProcGen environments.
Increased navigation efficiency in game level completion.
Reduced policy gradient variance through sparse clustering.
Abstract
Multi-scene reinforcement learning involves training the RL agent across multiple scenes / levels from the same task, and has become essential for many generalization applications. However, the inclusion of multiple scenes leads to an increase in sample variance for policy gradient computations, often resulting in suboptimal performance with the direct application of traditional methods (e.g. PPO, A3C). One strategy for variance reduction is to consider each scene as a distinct Markov decision process (MDP) and learn a joint value function dependent on both state (s) and MDP (M). However, this is non-trivial as the agent is usually unaware of the underlying level at train / test times in multi-scene RL. Recently, Singh et al. [1] tried to address this by proposing a dynamic value estimation approach that models the true joint value function distribution as a Gaussian mixture model…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Vision and Imaging · Reinforcement Learning in Robotics · Model Reduction and Neural Networks
MethodsEntropy Regularization · Proximal Policy Optimization
