A Closer Look at the Adversarial Robustness of Information Bottleneck Models
Iryna Korshunova, David Stutz, Alexander A. Alemi, Olivia Wiles, Sven, Gowal

TL;DR
This paper critically examines the adversarial robustness of information bottleneck models, revealing that they are not inherently robust and that earlier claims of improved defense were likely due to gradient obfuscation.
Contribution
It provides a comprehensive evaluation showing that information bottleneck models do not offer strong adversarial defenses when properly tested against white-box attacks.
Findings
Information bottleneck models are not robust against white-box $l_{ infty}$ attacks.
Previous claims of robustness may be due to gradient obfuscation.
Proper evaluation undermines earlier optimistic results.
Abstract
We study the adversarial robustness of information bottleneck models for classification. Previous works showed that the robustness of models trained with information bottlenecks can improve upon adversarial training. Our evaluation under a diverse range of white-box attacks suggests that information bottlenecks alone are not a strong defense strategy, and that previous results were likely influenced by gradient obfuscation.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Domain Adaptation and Few-Shot Learning
