Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty
Yanwei Jia

TL;DR
This paper introduces a novel risk-sensitive reinforcement learning framework in continuous time that incorporates quadratic variation penalties, enabling more robust decision-making under uncertainty and extending existing algorithms to risk-aware scenarios.
Contribution
It develops a martingale-based characterization of risk-sensitive RL with quadratic variation, adapting existing algorithms for risk sensitivity and proving convergence in financial models.
Findings
The proposed method effectively incorporates risk sensitivity via quadratic variation.
Convergence is proven for the algorithm in Merton's investment problem.
Simulation results show improved finite-sample performance in control tasks.
Abstract
This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (J Mach Learn Res 24(161): 1--61, 2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Traffic control and management · Muscle activation and electromyography studies
MethodsQ-Learning · Diffusion
