Non-convex entropic mean-field optimization via Best Response flow
Razvan-Andrei Lascu, Mateusz B. Majka

TL;DR
This paper investigates non-convex entropy-regularized optimization on probability measures using the Best Response flow, establishing conditions for convergence and applying results to reinforcement learning policy optimization.
Contribution
It extends the application of Best Response flow to non-convex functionals by identifying conditions for contraction and global optimality, relaxing convexity assumptions.
Findings
Best Response flow can be made a contraction with appropriate regularizer.
Unique fixed point corresponds to the global minimizer.
Applications demonstrated in reinforcement learning policy optimization.
Abstract
We study the problem of minimizing non-convex functionals on the space of probability measures, regularized by the relative entropy (KL divergence) with respect to a fixed reference measure, as well as the corresponding problem of solving entropy-regularized non-convex-non-concave min-max problems. We utilize the Best Response flow (also known in the literature as the fictitious play flow) and study how its convergence is influenced by the relation between the degree of non-convexity of the functional under consideration, the regularization parameter and the tail behaviour of the reference measure. In particular, we demonstrate how to choose the regularizer, given the non-convex functional, so that the Best Response operator becomes a contraction with respect to the -Wasserstein distance, which ensures the existence of its unique fixed point that is then shown to be the unique…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Stochastic Gradient Optimization Techniques · Optimization and Variational Analysis
MethodsSoftmax
