Continuous-time q-learning for mean-field control problems

Xiaoli Wei; Xiang Yu

arXiv:2306.16208·cs.LG·November 4, 2024·1 cites

Continuous-time q-learning for mean-field control problems

Xiaoli Wei, Xiang Yu

PDF

Open Access

TL;DR

This paper develops a continuous-time q-learning framework for mean-field control problems with entropy regularization, revealing two distinct q-functions and proposing model-free algorithms with demonstrated examples.

Contribution

It introduces a novel continuous-time q-learning approach for mean-field control, identifying two related q-functions and devising algorithms for model-free learning.

Findings

01

Two distinct q-functions are identified and related via an integral representation.

02

Model-free algorithms are proposed and validated through simulations.

03

Exact parameterization of optimal value and q-functions demonstrated in examples.

Abstract

This paper studies the q-learning, recently coined as the continuous time counterpart of Q-learning by Jia and Zhou (2023), for continuous time Mckean-Vlasov control problems in the setting of entropy-regularized reinforcement learning. In contrast to the single agent's control problem in Jia and Zhou (2023), the mean-field interaction of agents renders the definition of the q-function more subtle, for which we reveal that two distinct q-functions naturally arise: (i) the integrated q-function (denoted by $q$ ) as the first-order approximation of the integrated Q-function introduced in Gu, Guo, Wei and Xu (2023), which can be learnt by a weak martingale condition involving test policies; and (ii) the essential q-function (denoted by $q_{e}$ ) that is employed in the policy improvement iterations. We show that two q-functions are related via an integral representation under all test…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdaptive Dynamic Programming Control · Reinforcement Learning in Robotics · Advanced Control Systems Optimization

MethodsQ-Learning