Learning-to-Defer with Expert-Conditional Advice
Yannis Montreuil, Le\"ina Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

TL;DR
This paper extends learning-to-defer frameworks to include adaptive advice selection for experts, improving decision-making in systems where advice can be tailored post-expert selection.
Contribution
It introduces an augmented surrogate method that models expert-advice combinations and provides theoretical guarantees for optimal policy recovery.
Findings
The augmented surrogate outperforms standard methods across multiple tasks.
Separated surrogates are shown to be inconsistent even in simple settings.
The method adapts advice acquisition based on cost regimes.
Abstract
Learning-to-Defer routes each input to the expert that minimizes expected cost, but it assumes that the information available to every expert is fixed at decision time. Many modern systems violate this assumption: after selecting an expert, one may also choose what additional information that expert should receive, such as retrieved documents, tool outputs, or escalation context. We study this problem and call it Learning-to-Defer with advice. We show that a broad family of natural separated surrogates, which learn routing and advice with distinct heads, is inconsistent even in the smallest non-trivial setting. We then introduce an augmented surrogate that operates on the composite expert--advice action space and prove an -consistency guarantee together with an excess-risk transfer bound, yielding recovery of the Bayes-optimal policy in the limit. Experiments on tabular,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Bandit Algorithms Research · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
