On Optimality of Myopic Policy for Restless Multi-armed Bandit Problem with Non i.i.d. Arms and Imperfect Detection
Kehao Wang, Lin Chen, Quan Liu, Khaldoun Al Agha

TL;DR
This paper analyzes the optimality of myopic policies in complex restless multi-armed bandit problems with non-i.i.d. arms and imperfect detection, providing structural conditions for when myopic strategies are optimal.
Contribution
It introduces axioms for g-regular functions and derives closed-form conditions under which myopic policies are optimal in challenging RMAB settings.
Findings
Established structural conditions for myopic policy optimality.
Extended analysis to non-i.i.d. Markovian arms with imperfect sensing.
Provided a theoretical framework for practical policy design.
Abstract
We consider the channel access problem in a multi-channel opportunistic communication system with imperfect channel sensing, where the state of each channel evolves as a non independent and identically distributed Markov process. This problem can be cast into a restless multi-armed bandit (RMAB) problem that is intractable for its exponential computation complexity. A natural alternative is to consider the easily implementable myopic policy that maximizes the immediate reward but ignores the impact of the current strategy on the future reward. In particular, we develop three axioms characterizing a family of generic and practically important functions termed as -regular functions which includes a wide spectrum of utility functions in engineering. By pursuing a mathematical analysis based on the axioms, we establish a set of closed-form structural conditions for the optimality of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
