TL;DR
This paper introduces ROSS, a new post-hoc OOD detection method that uses instability measures under median smoothing to improve robustness against adversarial attacks, achieving state-of-the-art results.
Contribution
The paper proposes ROSS, a robust OOD detector leveraging score instability and median smoothing to defend against adversarial attacks, outperforming prior methods significantly.
Findings
ROSS achieves up to 40 AUROC points improvement over previous methods.
It performs strongly against both score-minimising and score-maximising attacks.
Extensive experiments on CIFAR-10, CIFAR-100, and ImageNet validate its effectiveness.
Abstract
Reliable out-of-distribution (OOD) detection is a critical requirement for the safe deployment of machine learning systems. Despite recent progress, state-of-the-art OOD detectors are highly susceptible to adversarial attacks, which undermines their trustworthiness in automated systems. To address this vulnerability, we apply median smoothing to baseline OOD detection scores, balancing clean and adversarial accuracies. Our key insight is that the noisy samples generated for median smoothing can be repurposed to quantify the local instability of the base score. We observe that OOD samples exhibit higher instability under perturbation. Based on this, we propose ROSS, a novel and robust post-hoc OOD detector that leverages the instability of baseline scores to further distinguish between in-distribution (ID) and OOD samples. ROSS achieves symmetric robustness, performing strongly against…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
