$\beta$-DQN: Improving Deep Q-Learning By Evolving the Behavior
Hongming Zhang, Fengshuo Bai, Chenjun Xiao, Chao Gao, Bo Xu, Martin M\"uller

TL;DR
The paper introduces $eta$-DQN, a simple and efficient exploration method for deep Q-learning that uses a behavior function to generate diverse policies, improving exploration without significant computational cost.
Contribution
It proposes $eta$-DQN, a novel exploration approach that combines a behavior function with an adaptive policy selection mechanism, enhancing exploration in deep reinforcement learning.
Findings
$eta$-DQN outperforms baseline methods on various tasks.
The method is easy to implement with minimal overhead.
It effectively balances exploration and bias correction.
Abstract
While many sophisticated exploration methods have been proposed, their lack of generality and high computational cost often lead researchers to favor simpler methods like -greedy. Motivated by this, we introduce -DQN, a simple and efficient exploration method that augments the standard DQN with a behavior function . This function estimates the probability that each action has been taken at each state. By leveraging , we generate a population of diverse policies that balance exploration between state-action coverage and overestimation bias correction. An adaptive meta-controller is designed to select an effective policy for each episode, enabling flexible and explainable exploration. -DQN is straightforward to implement and adds minimal computational overhead to the standard DQN. Experiments on both simple and challenging exploration domains show…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural Networks and Applications · Machine Learning and ELM · Face and Expression Recognition
MethodsDense Connections · Q-Learning · Convolution · Deep Q-Network
