Forward and Backward Bellman equations improve the efficiency of EM algorithm for DEC-POMDP
Takehiro Tottori, Tetsuya J. Kobayashi

TL;DR
This paper introduces Bellman-based modifications to the EM algorithm for DEC-POMDPs, significantly improving computational efficiency and convergence speed by replacing traditional forward-backward calculations with Bellman equations.
Contribution
The paper proposes the Bellman EM (BEM) and modified Bellman EM (MBEM) algorithms, integrating Bellman equations into EM to enhance efficiency in solving DEC-POMDPs.
Findings
MBEM converges faster than EM in experiments.
BEM is more efficient than EM for small problem sizes.
MBEM avoids matrix inversion, improving scalability.
Abstract
Decentralized partially observable Markov decision process (DEC-POMDP) models sequential decision making problems by a team of agents. Since the planning of DEC-POMDP can be interpreted as the maximum likelihood estimation for the latent variable model, DEC-POMDP can be solved by the EM algorithm. However, in EM for DEC-POMDP, the forward--backward algorithm needs to be calculated up to the infinite horizon, which impairs the computational efficiency. In this paper, we propose the Bellman EM algorithm (BEM) and the modified Bellman EM algorithm (MBEM) by introducing the forward and backward Bellman equations into EM. BEM can be more efficient than EM because BEM calculates the forward and backward Bellman equations instead of the forward--backward algorithm up to the infinite horizon. However, BEM cannot always be more efficient than EM when the size of problems is large because BEM…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
