Second-Order MPC-Based Distributed Q-Learning

Samuel Mallick; Filippo Airaldi; Azita Dabiri; Bart De Schutter

arXiv:2511.16424·eess.SY·May 7, 2026

Second-Order MPC-Based Distributed Q-Learning

Samuel Mallick, Filippo Airaldi, Azita Dabiri, Bart De Schutter

PDF

TL;DR

This paper introduces a second-order extension to distributed MPC-based Q-learning, leveraging local information and neighbor communication to enhance convergence speed and outperform first-order methods.

Contribution

It presents a novel second-order distributed Q-learning algorithm based on MPC that improves learning speed without requiring global information.

Findings

01

Significantly faster convergence compared to first-order methods

02

Effective use of local and neighbor information for updates

03

Demonstrated improved performance in simulations

Abstract

The state of the art for model predictive control (MPC)-based distributed Q-learning is limited to first-order gradient updates of the MPC parameterization. In general, using secondorder information can significantly improve the speed of convergence for learning, allowing the use of higher learning rates without introducing instability. This work presents a second-order extension to MPC-based Q-learning with updates distributed across local agents, relying only on locally available information and neighbor-to-neighbor communication. In simulation the approach is demonstrated to significantly outperform first-order distributed Q-learning.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.