Chasing Ghosts: Competing with Stateful Policies

Uriel Feige; Tomer Koren; Moshe Tennenholtz

arXiv:1407.7635·cs.LG·July 30, 2014·1 cites

Chasing Ghosts: Competing with Stateful Policies

Uriel Feige, Tomer Koren, Moshe Tennenholtz

PDF

Open Access

TL;DR

This paper studies sequential decision making with stateful reference policies under bandit feedback, proposing an algorithm with sublinear regret and establishing a lower bound, addressing challenges of tracking internal states.

Contribution

The paper introduces a novel algorithm for regret minimization in stateful policy settings with bandit feedback, and proves a new regret lower bound.

Findings

01

Proposed algorithm achieves regret of O(T / log^{1/4} T).

02

Lower bound on regret is established at O(T / log^{3/2} T).

03

Addresses the challenge of unobservable internal states of policies.

Abstract

We consider sequential decision making in a setting where regret is measured with respect to a set of stateful reference policies, and feedback is limited to observing the rewards of the actions performed (the so called "bandit" setting). If either the reference policies are stateless rather than stateful, or the feedback includes the rewards of all actions (the so called "expert" setting), previous work shows that the optimal regret grows like $Θ (T)$ in terms of the number of decision rounds $T$ . The difficulty in our setting is that the decision maker unavoidably loses track of the internal states of the reference policies, and thus cannot reliably attribute rewards observed in a certain round to any of the reference policies. In fact, in this setting it is impossible for the algorithm to estimate which policy gives the highest (or even approximately highest) total…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems