Off Policy Risk Sensitive Reinforcement Learning Based Optimal Tracking Control with Prescribe Performances
C. Li, Y. Wang, F. Liu, and M.Buss

TL;DR
This paper introduces an off-policy reinforcement learning control method that ensures prescribed performance in optimal tracking tasks, using risk-sensitive penalties and experience data for stable, practical implementation.
Contribution
It develops a novel off-policy RL framework with risk-sensitive constraints and experience-based critic learning, guaranteeing stability and convergence for prescribed performance tracking.
Findings
Proposed method achieves prescribed performance during learning.
Guarantees critic weight convergence without external excitation.
Simulation confirms effectiveness of the control strategy.
Abstract
An off policy reinforcement learning based control strategy is developed for the optimal tracking control problem to achieve the prescribed performance of full states during the learning process. The optimal tracking control problem is converted as an optimal regulation problem based on an auxiliary system. The requirements of prescribed performances are transformed into constraint satisfaction problems that are dealt with by risk sensitive state penalty terms under an optimization framework. To get approximated solutions of the Hamilton Jacobi Bellman equation, an off policy adaptive critic learning architecture is developed by using current data and experience data together. By using experience data, the proposed weight estimation update law of the critic learning agent guarantees weight convergence to the actual value. This technique enjoys practicability comparing with common…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Adaptive Control of Nonlinear Systems · Extremum Seeking Control Systems
