RBED: Reward Based Epsilon Decay
Aakash Maroti

TL;DR
This paper introduces a reward-based epsilon decay method for reinforcement learning, which adapts exploration decay based on environment feedback, leading to more consistent and improved performance over standard exponential decay.
Contribution
It proposes a novel reward-based epsilon decay strategy that dynamically adjusts exploration based on feedback, enhancing learning efficiency.
Findings
Reward-based decay yields more consistent results.
It outperforms standard exponential decay on average.
The approach adapts exploration to environment feedback.
Abstract
-greedy is a policy used to balance exploration and exploitation in many reinforcement learning setting. In cases where the agent uses some on-policy algorithm to learn optimal behaviour, it makes sense for the agent to explore more initially and eventually exploit more as it approaches the target behaviour. This shift from heavy exploration to heavy exploitation can be represented as decay in the value, where depicts the how much an agent is allowed to explore. This paper proposes a new approach to this decay where the decay is based on feedback from the environment. This paper also compares and contrasts one such approach based on rewards and compares it against standard exponential decay. The new approach, in the environments tested, produces more consistent results that on average perform better.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Optimization and Search Problems · Advanced Bandit Algorithms Research
