Exploration Behavior of Untrained Policies

Jacob Adamczyk

arXiv:2506.22566·cs.LG·July 28, 2025

Exploration Behavior of Untrained Policies

Jacob Adamczyk

PDF

Open Access

TL;DR

This paper investigates how the architecture of untrained deep neural policies influences exploration in reinforcement learning, providing theoretical insights and practical strategies to understand and improve initial exploration behaviors.

Contribution

It offers a theoretical and empirical framework linking neural architecture to exploration strategies in RL, highlighting the role of policy initialization.

Findings

01

Untrained policies generate correlated actions leading to specific exploration patterns.

02

Infinite-width network theory explains the distribution of untrained policy trajectories.

03

Policy architecture influences initial exploration behavior in RL environments.

Abstract

Exploration remains a fundamental challenge in reinforcement learning (RL), particularly in environments with sparse or adversarial reward structures. In this work, we study how the architecture of deep neural policies implicitly shapes exploration before training. We theoretically and empirically demonstrate strategies for generating ballistic or diffusive trajectories from untrained policies in a toy model. Using the theory of infinite-width networks and a continuous-time limit, we show that untrained policies return correlated actions and result in non-trivial state-visitation distributions. We discuss the distributions of the corresponding trajectories for a standard architecture, revealing insights into inductive biases for tackling exploration. Our results establish a theoretical and experimental framework for using policy initialization as a design tool to understand exploration…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adversarial Robustness in Machine Learning · Stochastic Gradient Optimization Techniques