A Differential and Pointwise Control Approach to Reinforcement Learning
Minh Nguyen, Chandrajit Bajaj

TL;DR
This paper introduces Differential Reinforcement Learning, a continuous-time control framework that incorporates physics priors, improves sample efficiency, and guarantees convergence, demonstrated through superior performance on scientific computing tasks.
Contribution
The paper presents Differential RL and a novel pointwise algorithm, dfPO, embedding physics priors and providing convergence guarantees and regret bounds, advancing RL in scientific computing.
Findings
Outperforms standard RL on scientific tasks
Ensures physically consistent trajectories
Provides convergence guarantees and regret bounds
Abstract
Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing due to poor sample efficiency and lack of pathwise physical consistency. We introduce Differential Reinforcement Learning (Differential RL), a novel framework that reformulates RL from a continuous-time control perspective via a differential dual formulation. This induces a Hamiltonian structure that embeds physics priors and ensures consistent trajectories without requiring explicit constraints. To implement Differential RL, we develop Differential Policy Optimization (dfPO), a pointwise, stage-wise algorithm that refines local movement operators along the trajectory for improved sample efficiency and dynamic alignment. We establish pointwise convergence guarantees, a property not available in standard RL, and derive a competitive theoretical regret bound of .…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Manufacturing and Logistics Optimization
MethodsDirect Preference Optimization · Focus
