A general Markov decision process formalism for action-state entropy-regularized reward maximization
Dmytro Grytskyy, Jorge Ram\'irez-Ruiz, Rub\'en Moreno-Bote

TL;DR
This paper introduces a unified dual function formalism for entropy-regularized reward maximization in Markov decision processes, simplifying complex optimization problems across various entropy types.
Contribution
It presents a general convex dual framework that transforms constrained entropy regularization problems into unconstrained convex optimization, encompassing pure and mixed entropy cases.
Findings
Unified formalism for action, state, and action-state entropy regularization.
Transforms constrained problems into unconstrained convex optimization.
Applicable to pure and mixed entropy scenarios.
Abstract
Previous work has separately addressed different forms of action, state and action-state entropy regularization, pure exploration and space occupation. These problems have become extremely relevant for regularization, generalization, speeding up learning and providing robust solutions at unprecedented levels. However, solutions of those problems are hectic, ranging from convex and non-convex optimization, and unconstrained optimization to constrained optimization. Here we provide a general dual function formalism that transforms the constrained optimization problem into an unconstrained convex one for any mixture of action and state entropies. The cases with pure action entropy and pure state entropy are understood as limits of the mixture.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsNeural dynamics and brain function · CCD and CMOS Imaging Sensors · Advanced Memory and Neural Computing
