Continuous-time q-learning for mean-field control problems
Xiaoli Wei, Xiang Yu

TL;DR
This paper develops a continuous-time q-learning framework for mean-field control problems with entropy regularization, revealing two distinct q-functions and proposing model-free algorithms with demonstrated examples.
Contribution
It introduces a novel continuous-time q-learning approach for mean-field control, identifying two related q-functions and devising algorithms for model-free learning.
Findings
Two distinct q-functions are identified and related via an integral representation.
Model-free algorithms are proposed and validated through simulations.
Exact parameterization of optimal value and q-functions demonstrated in examples.
Abstract
This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by ) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by ) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Control Systems Optimization
MethodsQ-Learning
