On the connection between Bregman divergence and value in regularized   Markov decision processes

Brendan O'Donoghue

arXiv:2210.12160·cs.LG·November 8, 2022

On the connection between Bregman divergence and value in regularized Markov decision processes

Brendan O'Donoghue

PDF

Open Access

TL;DR

This paper establishes a relationship between Bregman divergence and policy optimality in regularized Markov decision processes, with implications for reinforcement learning and regret analysis.

Contribution

It introduces a novel connection between Bregman divergence and value suboptimality in regularized MDPs, enhancing understanding of policy optimality measures.

Findings

01

Derived a mathematical relationship linking Bregman divergence to policy suboptimality.

02

Implications for multi-task and offline reinforcement learning are discussed.

03

Provides insights useful for regret analysis under function approximation.

Abstract

In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsDecision-Making and Behavioral Economics · Experimental Behavioral Economics Studies