Linear Bandits beyond Inner Product Spaces, the case of Bandit Optimal Transport
Lorenzo Croissant (CREST, FAIRPLAY, ENSAE Paris)

TL;DR
This paper extends linear bandit theory beyond inner product spaces by incorporating Optimal Transport problems, proposing a new algorithm that achieves competitive regret bounds without relying on traditional inner product assumptions.
Contribution
It introduces a refined OFUL algorithm embedding actions into a Hilbertian subspace, enabling efficient learning in non-inner product structured problems like Optimal Transport.
Findings
Achieves regret bounds similar to classical OFUL up to an approximation term.
Interpolates regret between (T) and (T), depending on cost regularity.
Recovers parametric rate ( ext{d}T) in finite-dimensional cases.
Abstract
Linear bandits have long been a central topic in online learning, with applications ranging from recommendation systems to adaptive clinical trials. Their general learnability has been established when the objective is to minimise the inner product between a cost parameter and the decision variable. While this is highly general, this reliance on an inner product structure belies the name of \emph{linear} bandits, and fails to account for problems such as Optimal Transport. Using the Kantorovich formulation of Optimal Transport as an example, we show that an inner product structure is \emph{not} necessary to achieve efficient learning in linear bandits. We propose a refinement of the classical OFUL algorithm that operates by embedding the action set into a Hilbertian subspace, where confidence sets can be built via least-squares estimation. Actions are then constrained to this subspace…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
