TL;DR
This paper introduces a novel meta-offline distributional multi-agent reinforcement learning algorithm, M-CQR, combining conservative Q-learning, risk-sensitive quantile regression, and meta-learning for improved risk-aware decision-making in dynamic, uncertain environments.
Contribution
It presents the first integration of meta-offline distributional MARL with risk-sensitive methods, enabling rapid adaptation and safer learning in complex multi-agent scenarios.
Findings
M-CTDE-CQR converges up to 50% faster than baseline methods.
The proposed approach improves scalability, robustness, and adaptability.
Code is publicly available at the provided GitHub URL.
Abstract
Mission critical applications, such as UAV-assisted IoT networks require risk-aware decision-making under dynamic topologies and uncertain channels. We propose meta-conservative quantile regression (M-CQR), a meta-offline distributional MARL algorithm that integrates conservative Q-learning (CQL) for safe offline learning, quantile regression DQN (QR-DQN) for risk-sensitive value estimation, and model-agnostic meta-learning (MAML) for rapid adaptation. Two variants are developed: meta-independent CQR (M-I-CQR) and meta-CTDE-CQR. In a UAV-based communication scenario, M-CTDE-CQR achieves up to 50% faster convergence and outperforms baseline MARL methods, offering improved scalability, robustness, and adaptability for risk-sensitive decision-making. Code is available at https://github.com/Eslam211/MA_Meta_ODRL
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
