The Exploration-Exploitation Dilemma Revisited: An Entropy Perspective
Renye Yan, Yaozhong Gan, You Wu, Ling Liang, Junliang Xing, Yimao Cai,, Ru Huang

TL;DR
This paper introduces AdaZero, an entropy-based adaptive framework for balancing exploration and exploitation in reinforcement learning, significantly improving performance across various environments including Montezuma.
Contribution
It presents a novel entropy perspective on the exploration-exploitation dilemma and develops AdaZero, an end-to-end adaptive method that automatically balances exploration and exploitation.
Findings
AdaZero outperforms baseline models in Atari and MuJoCo environments.
In Montezuma, AdaZero increases final returns by up to fifteen times.
Visualization shows entropy effectively reflects the adaptive process and agent performance.
Abstract
The imbalance of exploration and exploitation has long been a significant challenge in reinforcement learning. In policy optimization, excessive reliance on exploration reduces learning efficiency, while over-dependence on exploitation might trap agents in local optima. This paper revisits the exploration-exploitation dilemma from the perspective of entropy by revealing the relationship between entropy and the dynamic adaptive process of exploration and exploitation. Based on this theoretical insight, we establish an end-to-end adaptive framework called AdaZero, which automatically determines whether to explore or to exploit as well as their balance of strength. Experiments show that AdaZero significantly outperforms baseline models across various Atari and MuJoCo environments with only a single setting. Especially in the challenging environment of Montezuma, AdaZero boosts the final…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMilitary Defense Systems Analysis · Opinion Dynamics and Social Influence
