A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen; Chandrajit Bajaj

arXiv:2404.15617·cs.LG·February 6, 2026

A Differential and Pointwise Control Approach to Reinforcement Learning

Minh Nguyen, Chandrajit Bajaj

PDF

Open Access 1 Video

TL;DR

This paper introduces Differential Reinforcement Learning, a continuous-time control framework that incorporates physics priors, improves sample efficiency, and guarantees convergence, demonstrated through superior performance on scientific computing tasks.

Contribution

The paper presents Differential RL and a novel pointwise algorithm, dfPO, embedding physics priors and providing convergence guarantees and regret bounds, advancing RL in scientific computing.

Findings

01

Outperforms standard RL on scientific tasks

02

Ensures physically consistent trajectories

03

Provides convergence guarantees and regret bounds

Abstract

Reinforcement learning (RL) in continuous state-action spaces remains challenging in scientific computing due to poor sample efficiency and lack of pathwise physical consistency. We introduce Differential Reinforcement Learning (Differential RL), a novel framework that reformulates RL from a continuous-time control perspective via a differential dual formulation. This induces a Hamiltonian structure that embeds physics priors and ensures consistent trajectories without requiring explicit constraints. To implement Differential RL, we develop Differential Policy Optimization (dfPO), a pointwise, stage-wise algorithm that refines local movement operators along the trajectory for improved sample efficiency and dynamic alignment. We establish pointwise convergence guarantees, a property not available in standard RL, and derive a competitive theoretical regret bound of $O (K^{5/6})$ .…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

A Differential and Pointwise Control Approach to Reinforcement Learning· slideslive

Taxonomy

TopicsAdvanced Manufacturing and Logistics Optimization

MethodsDirect Preference Optimization · Focus