Exploration Behavior of Untrained Policies
Jacob Adamczyk

TL;DR
This paper investigates how the architecture of untrained deep neural policies influences exploration in reinforcement learning, providing theoretical insights and practical strategies to understand and improve initial exploration behaviors.
Contribution
It offers a theoretical and empirical framework linking neural architecture to exploration strategies in RL, highlighting the role of policy initialization.
Findings
Untrained policies generate correlated actions leading to specific exploration patterns.
Infinite-width network theory explains the distribution of untrained policy trajectories.
Policy architecture influences initial exploration behavior in RL environments.
Abstract
Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques
