Efficient Algorithms for Adversarial Contextual Learning

Vasilis Syrgkanis; Akshay Krishnamurthy; Robert E. Schapire

arXiv:1602.02454·cs.LG·February 9, 2016·45 cites

Efficient Algorithms for Adversarial Contextual Learning

Vasilis Syrgkanis, Akshay Krishnamurthy, Robert E. Schapire

PDF

Open Access

TL;DR

This paper introduces the first oracle-efficient algorithms with sublinear regret for adversarial contextual bandit problems, addressing both transductive and small separator settings, and extends to semi-bandit and combinatorial optimization.

Contribution

It presents novel, efficient algorithms for adversarial contextual bandits with regret bounds, applicable to semi-bandit and combinatorial problems, advancing the state of the art.

Findings

01

Achieves regret $O(T^{3/4}\sqrt{K\log(N)})$ in transductive setting

02

Achieves regret $O(T^{2/3}\d^{3/4} ext{K}\sqrt{\log(N)})$ in separator setting

03

Extends to semi-bandit linear optimization and contextual combinatorial optimization

Abstract

We provide the first oracle efficient sublinear regret algorithms for adversarial versions of the contextual bandit problem. In this problem, the learner repeatedly makes an action on the basis of a context and receives reward for the chosen action, with the goal of achieving reward competitive with a large class of policies. We analyze two settings: i) in the transductive setting the learner knows the set of contexts a priori, ii) in the small separator setting, there exists a small set of contexts such that any two policies behave differently in one of the contexts in the set. Our algorithms fall into the follow the perturbed leader family \cite{Kalai2005} and achieve regret $O (T^{3/4} K lo g (N))$ in the transductive setting and $O (T^{2/3} d^{3/4} K lo g (N))$ in the separator setting, where $K$ is the number of actions, $N$ is the number of baseline policies, and $d$ is the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems