Path Planning Problems with Side Observations-When Colonels Play Hide-and-Seek
Dong Quan Vu, Patrick Loiseau, Alonso Silva, Long Tran-Thanh

TL;DR
This paper models resource allocation games like Colonel Blotto and Hide-and-Seek as path planning problems with side observations, introducing a novel efficient algorithm with proven regret bounds for these complex online learning scenarios.
Contribution
It introduces EXP3-OE, the first efficient algorithm for SOPPP without auxiliary oracles, with proven regret bounds and improved performance under certain observability assumptions.
Findings
EXP3-OE achieves regret bounds matching the best benchmarks.
The algorithm is the first with guaranteed efficiency for SOPPP.
Application to CB and HS games demonstrates its practical benefits.
Abstract
Resource allocation games such as the famous Colonel Blotto (CB) and Hide-and-Seek (HS) games are often used to model a large variety of practical problems, but only in their one-shot versions. Indeed, due to their extremely large strategy space, it remains an open question how one can efficiently learn in these games. In this work, we show that the online CB and HS games can be cast as path planning problems with side-observations (SOPPP): at each stage, a learner chooses a path on a directed acyclic graph and suffers the sum of losses that are adversarially assigned to the corresponding edges; and she then receives semi-bandit feedback with side-observations (i.e., she observes the losses on the chosen edges plus some others). We propose a novel algorithm, EXP3-OE, the first-of-its-kind with guaranteed efficient running time for SOPPP without requiring any auxiliary oracle. We provide…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Reinforcement Learning in Robotics · Optimization and Search Problems
