Sample and Computationally Efficient Continuous-Time Reinforcement Learning with General Function Approximation
Runze Zhao, Yue Yu, Adams Yiyue Zhu, Chen Yang, Dongruo Zhou

TL;DR
This paper introduces a theoretically grounded, sample-efficient, and computationally practical continuous-time reinforcement learning algorithm that leverages general function approximation, optimism-based confidence sets, and structured policy updates, with empirical validation on control tasks.
Contribution
It provides the first sample complexity guarantee for CTRL with general function approximation and proposes structured policy updates to enhance efficiency.
Findings
Achieves near-optimal policy with suboptimality gap of rac{}{}(+d_{}+d_{})N^{-1/2}
Reduces number of policy updates and rollouts while maintaining performance
Demonstrates competitive results on continuous control and diffusion model tasks
Abstract
Continuous-time reinforcement learning (CTRL) provides a principled framework for sequential decision-making in environments where interactions evolve continuously over time. Despite its empirical success, the theoretical understanding of CTRL remains limited, especially in settings with general function approximation. In this work, we propose a model-based CTRL algorithm that achieves both sample and computational efficiency. Our approach leverages optimism-based confidence sets to establish the first sample complexity guarantee for CTRL with general function approximation, showing that a near-optimal policy can be learned with a suboptimality gap of using measurements, where and denote the distributional Eluder dimensions of the reward and dynamic functions, respectively, capturing…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Muscle activation and electromyography studies · Elevator Systems and Control
MethodsAttention Is All You Need · Linear Layer · Residual Connection · Byte Pair Encoding · Dropout · Multi-Head Attention · Dense Connections · Layer Normalization · Diffusion · Softmax
