Q-CP: Learning Action Values for Cooperative Planning
Francesco Riccio, Roberto Capobianco, Daniele Nardi

TL;DR
Q-CP is a novel cooperative reinforcement learning algorithm that uses action values to improve exploration and efficiency in multi-robot planning tasks with high complexity and uncertainty.
Contribution
It introduces a model-based RL method combining Q-learning with Monte-Carlo Tree Search to enhance cooperative multi-robot planning under uncertainty.
Findings
Q-CP reduces computational demand in multi-robot planning.
Q-CP achieves effective coordination in various robot scenarios.
Q-CP outperforms baseline methods in stochastic cooperative games.
Abstract
Research on multi-robot systems has demonstrated promising results in manifold applications and domains. Still, efficiently learning an effective robot behaviors is very difficult, due to unstructured scenarios, high uncertainties, and large state dimensionality (e.g. hyper-redundant and groups of robot). To alleviate this problem, we present Q-CP a cooperative model-based reinforcement learning algorithm, which exploits action values to both (1) guide the exploration of the state space and (2) generate effective policies. Specifically, we exploit Q-learning to attack the curse-of-dimensionality in the iterations of a Monte-Carlo Tree Search. We implement and evaluate Q-CP on different stochastic cooperative (general-sum) games: (1) a simple cooperative navigation problem among 3 robots, (2) a cooperation scenario between a pair of KUKA YouBots performing hand-overs, and (3) a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMonte-Carlo Tree Search · Q-Learning
