Probe-then-Commit Multi-Objective Bandits: Theoretical Benefits of Limited Multi-Arm Feedback

Ming Shi

arXiv:2602.03175·cs.LG·February 23, 2026

Probe-then-Commit Multi-Objective Bandits: Theoretical Benefits of Limited Multi-Arm Feedback

Ming Shi

PDF

Open Access

TL;DR

This paper introduces a novel probe-then-commit algorithm for multi-objective bandits with limited multi-arm feedback, demonstrating a theoretical acceleration in learning efficiency proportional to the number of probes.

Contribution

It develops the PtC-P-UCB algorithm with frontier-aware probing and provides theoretical regret bounds showing benefits of limited multi-arm probing in multi-objective bandit problems.

Findings

01

Achieves a $1/\sqrt{q}$ acceleration in regret bounds with limited probing.

02

Extends to multi-modal probing with variance-adaptive bounds.

03

Provides theoretical guarantees for Pareto frontier exploration.

Abstract

We study an online resource-selection problem motivated by multi-radio access selection and mobile edge computing offloading. In each round, an agent chooses among $K$ candidate links/servers (arms) whose performance is a stochastic $d$ -dimensional vector (e.g., throughput, latency, energy, reliability). The key interaction is \emph{probe-then-commit (PtC)}: the agent may probe up to $q > 1$ candidates via control-plane measurements to observe their vector outcomes, but must execute exactly one candidate in the data plane. This limited multi-arm feedback regime strictly interpolates between classical bandits ( $q = 1$ ) and full-information experts ( $q = K$ ), yet existing multi-objective learning theory largely focuses on these extremes. We develop \textsc{PtC-P-UCB}, an optimistic probe-then-commit algorithm whose technical core is frontier-aware probing under uncertainty in a Pareto mode,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdvanced Bandit Algorithms Research · Age of Information Optimization · IoT and Edge/Fog Computing