Online Learning-to-Defer with Varying Experts
Dang Hoang Duy, Yannis Montreuil, Maxime Meyer, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

TL;DR
This paper introduces an online Learning-to-Defer algorithm for multiclass classification that handles streaming data, varying expert availability, and shifting expert distributions, with theoretical regret guarantees.
Contribution
It presents the first online L2D algorithm with regret bounds for dynamic expert pools and demonstrates its effectiveness on synthetic and real datasets.
Findings
Achieves regret of O((n+n_e)T^{2/3}) in general settings.
Achieves regret of O((n+n_e)√T) under low-noise conditions.
Effectively extends standard L2D to dynamic expert availability scenarios.
Abstract
Learning-to-Defer (L2D) methods route each query either to a predictive model or to external experts. While existing work studies this problem in batch settings, real-world deployments require handling streaming data, changing expert availability, and shifting expert distribution. We introduce the first online L2D algorithm for multiclass classification with bandit feedback and a dynamically varying pool of experts. Our method achieves regret guarantees of in general and under a low-noise condition, where is the time horizon, is the number of labels, and is the number of distinct experts observed across rounds. The analysis builds on novel -consistency bounds for the online framework, combined with first-order methods for online convex optimization. Experiments on synthetic and real-world datasets demonstrate that our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
