Risk-seeking conservative policy iteration with agent-state based policies for Dec-POMDPs with guaranteed convergence
Amit Sinha, Matthieu Geist, Aditya Mahajan

TL;DR
This paper introduces a risk-seeking conservative policy iteration method for Dec-POMDPs that guarantees convergence to a local optimum with limited memory, achieving near-optimal performance efficiently.
Contribution
It proposes a novel policy iteration algorithm that incorporates risk-seeking incentives and guarantees convergence within a limited memory setting.
Findings
The approach performs comparably to state-of-the-art methods on benchmark problems.
Using more agent states improves policy performance.
The method guarantees polynomial runtime and convergence to a local optimum.
Abstract
Optimally solving decentralized decision-making problems modeled as Dec-POMDPs is known to be NEXP-complete. These optimal solutions are policies based on the entire history of observations and actions of an agent. However, some applications may require more compact policies because of limited compute capabilities, which can be modeled by considering a limited number of memory states (or agent states). While such an agent-state based policy class may not contain the optimal solution, it is still of practical interest to find the best agent-state policy within the class. We focus on an iterated best response style algorithm which guarantees monotonic improvements and convergence to a local optimum in polynomial runtime in the Dec-POMDP model size. In order to obtain a better local optimum, we use a modified objective which incentivizes risk-seeking alongside a conservative policy…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
