Beyond Static Bias: Adaptive Multi-Fidelity Bandits with Improving Proxies
Muyun Lu, Haoyang Hong, Huazheng Wang, Ying Lin

TL;DR
This paper extends multi-fidelity bandit models to include improving proxies, proposing an adaptive algorithm that decides when to continue low-fidelity sampling or escalate to high-fidelity evaluations, with theoretical and empirical validation.
Contribution
It introduces a dynamic model for improving proxies in multi-fidelity bandits and proposes TACC, an adaptive algorithm with regret guarantees for cost-effective exploration.
Findings
TACC effectively balances low- and high-fidelity sampling in experiments.
Adaptive continuation reduces regret compared to fixed strategies.
Theoretical bounds demonstrate improved efficiency for intermediate arms.
Abstract
As an extension of the classical multi-armed bandit problem, multi-fidelity multi-armed bandits (MF-MAB) enable individual arms to be evaluated using diverse feedback sources that vary in both cost and accuracy. Prior stochastic models typically assume fixed low-to-high fidelity discrepancies, whereas modern proxy sources, such as learning-based simulators and Large Language Models (LLMs), can be improved using additional calibration. We investigate adaptive MF-MAB with improving proxy sources, and focus on the canonical two-fidelity case in which the low-fidelity source becomes more informative with repeated use. To capture this dynamic, we introduce a selected-average mismatch bound that converts dynamic low-fidelity observations into improvement-aware confidence bounds for the high-fidelity target. We propose the Threshold-Based Adaptive Continuation Companion (TACC), an optimistic…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
