A Counterexample and a Corrective to the Vector Extension of the Bellman Equations of a Markov Decision Process
Anas Mifrani

TL;DR
This paper identifies a flaw in the vector extension of Bellman equations for Markov decision processes, provides a counterexample, and establishes conditions under which the extension is valid, improving understanding of multi-objective decision-making.
Contribution
It corrects a previous extension of Bellman equations for vector rewards by providing a counterexample and establishing conditions for validity.
Findings
Counterexample shows the extension's assumptions can fail.
Validity conditions include deterministic dynamics and short horizons.
Solutions are Pareto efficient policy sets under refined conditions.
Abstract
Under the expected total reward criterion, the optimal value of a finite-horizon Markov decision process can be determined by solving the Bellman equations. The equations were extended by D. J. White to processes with vector rewards in 1982. Using a counterexample, we show that the assumptions underlying this extension fail to guarantee its validity. Analysis of the counterexample leads us to articulate a sufficient condition for White's functional equations to be valid. The condition is shown to be true when the policy space has been refined to include a special class of non-Markovian policies, or when the dynamics of the model are deterministic, or when the decision making horizon does not exceed three time steps. The paper demonstrates that, in general, the solutions to White's equations are sets of Pareto efficient policy returns over the refined policy space. Our results are…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsClimate Change Policy and Economics · Economic theories and models
