Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Yannis Montreuil; Axel Carlier; Lai Xing Ng; Wei Tsang Ooi

arXiv:2502.01027·stat.ML·August 26, 2025

Adversarial Robustness in Two-Stage Learning-to-Defer: Algorithms and Guarantees

Yannis Montreuil, Axel Carlier, Lai Xing Ng, Wei Tsang Ooi

PDF

Open Access

TL;DR

This paper studies adversarial attacks on two-stage Learning-to-Defer systems, introduces new attack strategies, and proposes SARD, a robust learning algorithm with theoretical guarantees that enhances security without sacrificing performance.

Contribution

It is the first to analyze adversarial robustness in two-stage L2D, proposing novel attack methods and a provably robust learning algorithm applicable across multiple settings.

Findings

01

SARD significantly improves robustness against adversarial attacks.

02

SARD maintains strong performance on clean data.

03

Theoretical guarantees hold across various learning tasks.

Abstract

Two-stage Learning-to-Defer (L2D) enables optimal task delegation by assigning each input to either a fixed main model or one of several offline experts, supporting reliable decision-making in complex, multi-agent environments. However, existing L2D frameworks assume clean inputs and are vulnerable to adversarial perturbations that can manipulate query allocation--causing costly misrouting or expert overload. We present the first comprehensive study of adversarial robustness in two-stage L2D systems. We introduce two novel attack strategie--untargeted and targeted--which respectively disrupt optimal allocations or force queries to specific agents. To defend against such threats, we propose SARD, a convex learning algorithm built on a family of surrogate losses that are provably Bayes-consistent and $(R, G)$ -consistent. These guarantees hold across classification,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsPhysical Unclonable Functions (PUFs) and Hardware Security · Adversarial Robustness in Machine Learning · VLSI and Analog Circuit Testing