Perturbation-based exploration methods in deep reinforcement learning
Sneha Aenugu

TL;DR
This paper investigates whether improvements in deep reinforcement learning exploration are primarily due to structured exploration or simply the result of policy and reward perturbations, demonstrating that noise can significantly enhance exploration.
Contribution
The study reveals that perturbations in policy and reward spaces can substantially improve exploration, challenging the emphasis on structured exploration methods.
Findings
Perturbing policy before the softmax layer boosts exploration.
Introducing sporadic reward bonuses enhances exploration.
Noisy exploration can outperform structured exploration methods.
Abstract
Recent research on structured exploration placed emphasis on identifying novel states in the state space and incentivizing the agent to revisit them through intrinsic reward bonuses. In this study, we question whether the performance boost demonstrated through these methods is indeed due to the discovery of structure in exploratory schedule of the agent or is the benefit largely attributed to the perturbations in the policy and reward space manifested in pursuit of structured exploration. In this study we investigate the effect of perturbations in policy and reward spaces on the exploratory behavior of the agent. We proceed to show that simple acts of perturbing the policy just before the softmax layer and introduction of sporadic reward bonuses into the domain can greatly enhance exploration in several domains of the arcade learning environment. In light of these findings, we recommend…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Modular Robots and Swarm Intelligence · Distributed Control Multi-Agent Systems
MethodsSoftmax
