Adaptive Variance for Changing Sparse-Reward Environments
Xingyu Lin, Pengsheng Guo, Carlos Florensa, David Held

TL;DR
This paper introduces a method to adapt the exploration variance of policies in changing sparse-reward environments, improving robot adaptability without explicitly modeling environmental changes.
Contribution
It provides a theoretical framework linking value functions to exploration variance, enabling effective policy adaptation in dynamic environments.
Findings
The proposed variance adjustment strategy improves exploration in changing environments.
The method enables faster adaptation compared to fixed variance policies.
The approach is effective across various sparse-reward scenarios.
Abstract
Robots that are trained to perform a task in a fixed environment often fail when facing unexpected changes to the environment due to a lack of exploration. We propose a principled way to adapt the policy for better exploration in changing sparse-reward environments. Unlike previous works which explicitly model environmental changes, we analyze the relationship between the value function and the optimal exploration for a Gaussian-parameterized policy and show that our theory leads to an effective strategy for adjusting the variance of the policy, enabling fast adapt to changes in a variety of sparse-reward environments.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Gaussian Processes and Bayesian Inference · Advanced Bandit Algorithms Research
