Learning-Augmented Algorithms for MTS with Bandit Access to Multiple Predictors
Matei Gabriel Co\c{s}a, Marek Eli\'a\v{s}

TL;DR
This paper introduces algorithms for Metrical Task Systems that leverage multiple heuristics with bandit feedback, achieving near-optimal performance by selecting the best heuristic over time.
Contribution
It develops learning-augmented algorithms for MTS with bandit access to multiple heuristics, providing regret bounds and matching lower bounds.
Findings
Achieves regret of O(OPT^{2/3})
Provides tight lower bounds for the problem
Demonstrates effectiveness of bandit-based heuristic selection
Abstract
We consider the following problem: We are given heuristics for Metrical Task Systems (MTS), where each might be tailored to a different type of input instances. While processing an input instance received online, we are allowed to query the action of only one of the heuristics at each time step. Our goal is to achieve performance comparable to the best of the given heuristics. The main difficulty of our setting comes from the fact that the cost paid by a heuristic at time cannot be estimated unless the same heuristic was also queried at time . This is related to Bandit Learning against memory bounded adversaries (Arora et al., 2012). We show how to achieve regret of and prove a tight lower bound based on the construction of Dekel et al. (2013).
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Machine Learning and Algorithms · Optimization and Search Problems
