Exploring Unknown States with Action Balance
Yan Song, Yingfeng Chen, Yujing Hu, Changjie Fan

TL;DR
This paper introduces action balance exploration, an extension of UCB for deep reinforcement learning, which improves the discovery of unknown states and enhances exploration efficiency in challenging environments.
Contribution
It proposes a novel action balance exploration method and combines it with RND to better explore unknown states in reinforcement learning environments.
Findings
Action balance exploration outperforms traditional methods in finding unknown states.
Combining action balance with RND improves performance in hard exploration environments.
Experiments on grid world and Atari demonstrate enhanced exploration capabilities.
Abstract
Exploration is a key problem in reinforcement learning. Recently bonus-based methods have achieved considerable successes in environments where exploration is difficult such as Montezuma's Revenge, which assign additional bonuses (e.g., intrinsic rewards) to guide the agent to rarely visited states. Since the bonus is calculated according to the novelty of the next state after performing an action, we call such methods as the next-state bonus methods. However, the next-state bonus methods force the agent to pay overmuch attention in exploring known states and ignore finding unknown states since the exploration is driven by the next state already visited, which may slow the pace of finding reward in some environments. In this paper, we focus on improving the effectiveness of finding unknown states and propose action balance exploration, which balances the frequency of selecting each…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Artificial Intelligence in Games · Advanced Bandit Algorithms Research
