The Choice Function Framework for Online Policy Improvement
Murugeswari Issakkimuthu, Alan Fern, Prasad Tadepalli

TL;DR
This paper introduces a formal framework called the choice function framework to analyze online search procedures for policy improvement, providing conditions that guarantee performance does not degrade compared to the original policy.
Contribution
It presents sufficient conditions for choice functions to ensure online search guarantees policy performance, and introduces a parametric class of such functions with empirical validation.
Findings
Sufficient conditions are established for choice functions to guarantee policy improvement.
A parametric class of choice functions satisfying these conditions is proposed.
Empirical use cases demonstrate the framework's practical utility.
Abstract
There are notable examples of online search improving over hand-coded or learned policies (e.g. AlphaZero) for sequential decision making. It is not clear, however, whether or not policy improvement is guaranteed for many of these approaches, even when given a perfect evaluation function and transition model. Indeed, simple counter examples show that seemingly reasonable online search procedures can hurt performance compared to the original policy. To address this issue, we introduce the choice function framework for analyzing online search procedures for policy improvement. A choice function specifies the actions to be considered at every node of a search tree, with all other actions being pruned. Our main contribution is to give sufficient conditions for stationary and non-stationary choice functions to guarantee that the value achieved by online search is no worse than the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
