On the connection between Bregman divergence and value in regularized Markov decision processes
Brendan O'Donoghue

TL;DR
This paper establishes a relationship between Bregman divergence and policy optimality in regularized Markov decision processes, with implications for reinforcement learning and regret analysis.
Contribution
It introduces a novel connection between Bregman divergence and value suboptimality in regularized MDPs, enhancing understanding of policy optimality measures.
Findings
Derived a mathematical relationship linking Bregman divergence to policy suboptimality.
Implications for multi-task and offline reinforcement learning are discussed.
Provides insights useful for regret analysis under function approximation.
Abstract
In this short note we derive a relationship between the Bregman divergence from the current policy to the optimal policy and the suboptimality of the current value function in a regularized Markov decision process. This result has implications for multi-task reinforcement learning, offline reinforcement learning, and regret analysis under function approximation, among others.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDecision-Making and Behavioral Economics · Experimental Behavioral Economics Studies
