Multiagent Model-based Credit Assignment for Continuous Control
Dongge Han, Chris Xiaoxuan Lu, Tomasz Michalak, Michael Wooldridge

TL;DR
This paper introduces a decentralized multiagent reinforcement learning framework for continuous control in robotics, combining cooperative PPO, game-theoretic credit assignment, and model-based RL to improve sample efficiency and enable decentralized operation.
Contribution
It presents a novel decentralized multiagent RL framework with a game-theoretic credit assignment and model-based components for continuous control tasks.
Findings
Effective in Mujoco locomotion tasks
Improves sample efficiency significantly
Enables decentralized control without communication
Abstract
Deep reinforcement learning (RL) has recently shown great promise in robotic continuous control tasks. Nevertheless, prior research in this vein center around the centralized learning setting that largely relies on the communication availability among all the components of a robot. However, agents in the real world often operate in a decentralised fashion without communication due to latency requirements, limited power budgets and safety concerns. By formulating robotic components as a system of decentralised agents, this work presents a decentralised multiagent reinforcement learning framework for continuous control. To this end, we first develop a cooperative multiagent PPO framework that allows for centralized optimisation during training and decentralised operation during execution. However, the system only receives a global reward signal which is not attributed towards each agent.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsReinforcement Learning in Robotics · Robotic Locomotion and Control · Zebrafish Biomedical Research Applications
MethodsEntropy Regularization · Proximal Policy Optimization
