Temporal-Differential Learning in Continuous Environments
Tao Bian, Zhong-Ping Jiang

TL;DR
This paper introduces a novel reinforcement learning approach called temporal-differential learning, designed for continuous environments, with new methods and theoretical and empirical validation of its effectiveness.
Contribution
It presents the temporal-differential learning framework and develops continuous-time least squares policy evaluation and temporal-differential methods, advancing RL in continuous settings.
Findings
The proposed methods outperform traditional approaches in continuous environments.
Theoretical analysis confirms convergence properties.
Empirical results demonstrate improved policy evaluation accuracy.
Abstract
In this paper, a new reinforcement learning (RL) method known as the method of temporal differential is introduced. Compared to the traditional temporal-difference learning method, it plays a crucial role in developing novel RL techniques for continuous environments. In particular, the continuous-time least squares policy evaluation (CT-LSPE) and the continuous-time temporal-differential (CT-TD) learning methods are developed. Both theoretical and empirical evidences are provided to demonstrate the effectiveness of the proposed temporal-differential learning methodology.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Adaptive Dynamic Programming Control · Extremum Seeking Control Systems
