Filtered Poisson Process Bandit on a Continuum

James A. Grant; and Roberto Szechtman

arXiv:2007.09966·cs.LG·July 21, 2020

Filtered Poisson Process Bandit on a Continuum

James A. Grant, and Roberto Szechtman

PDF

TL;DR

This paper introduces a novel continuum bandit problem involving filtered Poisson processes, proposing an adaptive algorithm with near-optimal regret bounds under Lipschitz conditions.

Contribution

It formulates a new bandit model with filtered Poisson observations and develops an upper confidence bound algorithm with matching regret bounds.

Findings

01

The proposed UCB algorithm achieves O(T^(2/3)) regret.

02

Lower bounds match the upper bounds up to a logarithmic factor.

03

The approach effectively handles unknown Poisson intensities with known filtering functions.

Abstract

We consider a version of the continuum armed bandit where an action induces a filtered realisation of a non-homogeneous Poisson process. Point data in the filtered sample are then revealed to the decision-maker, whose reward is the total number of revealed points. Using knowledge of the function governing the filtering, but without knowledge of the Poisson intensity function, the decision-maker seeks to maximise the expected number of revealed points over T rounds. We propose an upper confidence bound algorithm for this problem utilising data-adaptive discretisation of the action space. This approach enjoys O(T^(2/3)) regret under a Lipschitz assumption on the reward function. We provide lower bounds on the regret of any algorithm for the problem, via new lower bounds for related finite-armed bandits, and show that the orders of the upper and lower bounds match up to a logarithmic…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.