Consensus Based Stochastic Control
Liyao Lyu, Jingrun Chen

TL;DR
This paper introduces a gradient-free reinforcement learning algorithm using consensus-based optimization techniques to efficiently solve high-dimensional stochastic control problems, demonstrating scalability and convergence properties.
Contribution
It develops M-CBO and Adam-CBO frameworks that optimize policies via value function estimates, reducing variance and improving convergence in complex environments.
Findings
Algorithms accurately solve high-dimensional control problems.
Methods demonstrate scalability across various problem sizes.
Theoretical proof of convergence under certain conditions.
Abstract
We propose a gradient-free deep reinforcement learning algorithm to solve high-dimensional, finite-horizon stochastic control problems. Although the recently developed deep reinforcement learning framework has achieved great success in solving these problems, direct estimation of policy gradients from Monte Carlo sampling often suffers from high variance. To address this, we introduce the Momentum Consensus-Based Optimization (M-CBO) and Adaptive Momentum Consensus-Based Optimization (Adam-CBO) frameworks. These methods optimize policies using Monte Carlo estimates of the value function, rather than its gradients. Adjustable Gaussian noise supports efficient exploration, helping the algorithm converge to optimal policies in complex, nonconvex environments. Numerical results confirm the accuracy and scalability of our approach across various problem dimensions and show the potential for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDistributed systems and fault tolerance · Logic, Reasoning, and Knowledge · Formal Methods in Verification
