Confidence-aware Denoised Fine-tuning of Off-the-shelf Models for Certified Robustness
Suhyeok Jang, Seojin Kim, Jinwoo Shin, Jongheon Jeong

TL;DR
This paper introduces FT-CADIS, a fine-tuning method that improves the certified robustness of off-the-shelf models against adversarial attacks by selectively training on confidence-filtered denoised images.
Contribution
The paper proposes a confidence-aware fine-tuning scheme that enhances denoised smoothing robustness by filtering hallucinated images and updating only a small part of the model.
Findings
FT-CADIS achieves state-of-the-art certified robustness across benchmarks.
Selective fine-tuning on confidence-filtered images improves robustness.
The method requires updating only a small fraction of model parameters.
Abstract
The remarkable advances in deep learning have led to the emergence of many off-the-shelf classifiers, e.g., large pre-trained models. However, since they are typically trained on clean data, they remain vulnerable to adversarial attacks. Despite this vulnerability, their superior performance and transferability make off-the-shelf classifiers still valuable in practice, demanding further work to provide adversarial robustness for them in a post-hoc manner. A recently proposed method, denoised smoothing, leverages a denoiser model in front of the classifier to obtain provable robustness without additional training. However, the denoiser often creates hallucination, i.e., images that have lost the semantics of their originally assigned class, leading to a drop in robustness. Furthermore, its noise-and-denoise procedure introduces a significant distribution shift from the original…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFault Detection and Control Systems
MethodsDiffusion · Denoised Smoothing
