Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

Yanwei Jia

arXiv:2404.12598·cs.LG·March 17, 2026

Continuous-time Risk-sensitive Reinforcement Learning via Quadratic Variation Penalty

Yanwei Jia

PDF

Open Access

TL;DR

This paper introduces a novel risk-sensitive reinforcement learning framework in continuous time that incorporates quadratic variation penalties, enabling more robust decision-making under uncertainty and extending existing algorithms to risk-aware scenarios.

Contribution

It develops a martingale-based characterization of risk-sensitive RL with quadratic variation, adapting existing algorithms for risk sensitivity and proving convergence in financial models.

Findings

01

The proposed method effectively incorporates risk sensitivity via quadratic variation.

02

Convergence is proven for the algorithm in Merton's investment problem.

03

Simulation results show improved finite-sample performance in control tasks.

Abstract

This paper studies continuous-time risk-sensitive reinforcement learning (RL) under the entropy-regularized, exploratory diffusion process formulation with the exponential-form objective. The risk-sensitive objective arises either as the agent's risk attitude or as a distributionally robust approach against the model uncertainty. Owing to the martingale perspective in Jia and Zhou (J Mach Learn Res 24(161): 1--61, 2023) the risk-sensitive RL problem is shown to be equivalent to ensuring the martingale property of a process involving both the value function and the q-function, augmented by an additional penalty term: the quadratic variation of the value process, capturing the variability of the value-to-go along the trajectory. This characterization allows for the straightforward adaptation of existing RL algorithms developed for non-risk-sensitive scenarios to incorporate risk…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsReinforcement Learning in Robotics · Traffic control and management · Muscle activation and electromyography studies

MethodsQ-Learning · Diffusion