Gaussian-Mixture-Model Q-Functions for Policy Iteration in Reinforcement Learning
Minh Vu, Konstantinos Slavakis

TL;DR
This paper proposes Gaussian mixture models as direct surrogates for Q-functions in reinforcement learning, enabling efficient policy evaluation with theoretical guarantees and competitive performance without experience data.
Contribution
It introduces GMM-QFs as a novel function approximation method for Q-functions, integrating Riemannian optimization and demonstrating universality and effectiveness.
Findings
GMM-QFs are universal approximators for Q-functions.
They achieve competitive or superior performance on benchmark RL tasks.
They operate efficiently without requiring experience data.
Abstract
Unlike their conventional use as estimators of probability density functions in reinforcement learning (RL), this paper introduces a novel function-approximation role for Gaussian mixture models (GMMs) as direct surrogates for Q-function losses. These parametric models, termed GMM-QFs, possess substantial representational capacity, as they are shown to be universal approximators over a broad class of functions. They are further embedded within Bellman residuals, where their learnable parameters -- a fixed number of mixing weights, together with Gaussian mean vectors and covariance matrices -- are inferred from data via optimization on a Riemannian manifold. This geometric perspective on the parameter space naturally incorporates Riemannian optimization into the policy-evaluation step of standard policy-iteration frameworks. Rigorous theoretical results are established, and supporting…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Advanced Multi-Objective Optimization Algorithms · Gaussian Processes and Bayesian Inference
