Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow

Minh Nguyen

arXiv:2602.14587·cs.LG·February 17, 2026

Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow

Minh Nguyen

PDF

Open Access

TL;DR

This paper introduces a novel decoupled continuous-time reinforcement learning algorithm that leverages Hamiltonian flows for stable and effective learning in non-uniform, event-driven control problems, outperforming existing methods.

Contribution

It proposes a decoupled actor-critic approach with Hamiltonian-based value flow and diffusion generator-based $q$-learning, with rigorous convergence proofs and superior empirical performance.

Findings

01

Outperforms prior continuous-time RL methods on benchmarks.

02

Achieves 21% profit in a real-world trading task.

03

Provides theoretical convergence guarantees for the proposed algorithm.

Abstract

Many real-world control problems, ranging from finance to robotics, evolve in continuous time with non-uniform, event-driven decisions. Standard discrete-time reinforcement learning (RL), based on fixed-step Bellman updates, struggles in this setting: as time gaps shrink, the $Q$ -function collapses to the value function $V$ , eliminating action ranking. Existing continuous-time methods reintroduce action information via an advantage-rate function $q$ . However, they enforce optimality through complicated martingale losses or orthogonality constraints, which are sensitive to the choice of test processes. These approaches entangle $V$ and $q$ into a large, complex optimization problem that is difficult to train reliably. To address these limitations, we propose a novel decoupled continuous-time actor-critic algorithm with alternating updates: $q$ is learned from diffusion generators on $V$ ,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research