Decoupled Continuous-Time Reinforcement Learning via Hamiltonian Flow
Minh Nguyen

TL;DR
This paper introduces a novel decoupled continuous-time reinforcement learning algorithm that leverages Hamiltonian flows for stable and effective learning in non-uniform, event-driven control problems, outperforming existing methods.
Contribution
It proposes a decoupled actor-critic approach with Hamiltonian-based value flow and diffusion generator-based $q$-learning, with rigorous convergence proofs and superior empirical performance.
Findings
Outperforms prior continuous-time RL methods on benchmarks.
Achieves 21% profit in a real-world trading task.
Provides theoretical convergence guarantees for the proposed algorithm.
Abstract
Many real-world control problems, ranging from finance to robotics, evolve in continuous time with non-uniform, event-driven decisions. Standard discrete-time reinforcement learning (RL), based on fixed-step Bellman updates, struggles in this setting: as time gaps shrink, the -function collapses to the value function , eliminating action ranking. Existing continuous-time methods reintroduce action information via an advantage-rate function . However, they enforce optimality through complicated martingale losses or orthogonality constraints, which are sensitive to the choice of test processes. These approaches entangle and into a large, complex optimization problem that is difficult to train reliably. To address these limitations, we propose a novel decoupled continuous-time actor-critic algorithm with alternating updates: is learned from diffusion generators on ,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Advanced Bandit Algorithms Research
