Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

Leixin Chang; Xinchen Yao; Ben Liu; Liangjing Yang; Hua Chen

arXiv:2603.27317·cs.RO·April 2, 2026

Where-to-Learn: Analytical Policy Gradient Directed Exploration for On-Policy Robotic Reinforcement Learning

Leixin Chang, Xinchen Yao, Ben Liu, Liangjing Yang, Hua Chen

PDF

TL;DR

This paper introduces a novel exploration method for on-policy robotic reinforcement learning that uses analytical policy gradients from a differentiable dynamics model to guide the agent towards high-reward states, improving learning efficiency.

Contribution

It presents a new directed exploration approach leveraging physics-guided, analytical policy gradients to enhance on-policy RL in robotics, unlike entropy-based methods.

Findings

01

Accelerated policy learning in robotic control tasks.

02

Effective steering towards high-reward regions.

03

Improved sample efficiency over traditional exploration methods.

Abstract

On-policy reinforcement learning (RL) algorithms have demonstrated great potential in robotic control, where effective exploration is crucial for efficient and high-quality policy learning. However, how to encourage the agent to explore the better trajectories efficiently remains a challenge. Most existing methods incentivize exploration by maximizing the policy entropy or encouraging novel state visiting regardless of the potential state value. We propose a new form of directed exploration that uses analytical policy gradients from a differentiable dynamics model to inject task-aware, physics-guided guidance, thereby steering the agent towards high-reward regions for accelerated and more effective policy learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.