Bandit Online Optimization Over the Permutahedron
Nir Ailon, Kohei Hatano, Eiji Takimoto

TL;DR
This paper introduces a new efficient algorithm for bandit online optimization over the permutahedron, improving regret bounds and computational complexity compared to previous methods, enabling practical large-scale applications.
Contribution
The paper presents a computationally efficient algorithm with improved regret bounds for bandit optimization over the permutahedron, combining existing approaches and novel variance analysis.
Findings
Achieves regret of O(n^{3/2}√T) with O(n^3 T) time complexity.
Provides a variance bound for the Plackett-Luce noisy sorting process.
Improves practicality of bandit optimization over the permutahedron for large T.
Abstract
The permutahedron is the convex polytope with vertex set consisting of the vectors for all permutations (bijections) over . We study a bandit game in which, at each step , an adversary chooses a hidden weight weight vector , a player chooses a vertex of the permutahedron and suffers an observed loss of . A previous algorithm CombBand of Cesa-Bianchi et al (2009) guarantees a regret of for a time horizon of . Unfortunately, CombBand requires at each step an -by- matrix permanent approximation to within improved accuracy as grows, resulting in a total running time that is super linear in , making it impractical for large time horizons. We provide an algorithm of regret with total time complexity . The ideas are a combination of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Optimization and Search Problems · Machine Learning and Algorithms
