Adversarial Training Reduces Information and Improves Transferability
Matteo Terzi, Alessandro Achille, Marco Maggipinto, Gian Antonio Susto

TL;DR
This paper explores how adversarial training enhances transferability and invertibility of neural network features, revealing a trade-off with accuracy and reducing Fisher information, supported by theoretical and empirical evidence.
Contribution
It uncovers the dual relationship between adversarial training and information theory, demonstrating improved transferability and invertibility, and introduces methods to enhance image reconstruction quality.
Findings
Adversarial training improves transferability of features.
It reduces Fisher information about inputs and weights.
Theoretical analysis explains invertibility without violating minimality.
Abstract
Recent results show that features of adversarially trained networks for classification, in addition to being robust, enable desirable properties such as invertibility. The latter property may seem counter-intuitive as it is widely accepted by the community that classification models should only capture the minimal information (features) required for the task. Motivated by this discrepancy, we investigate the dual relationship between Adversarial Training and Information Theory. We show that the Adversarial Training can improve linear transferability to new tasks, from which arises a new trade-off between transferability of representations and accuracy on the source task. We validate our results employing robust networks trained on CIFAR-10, CIFAR-100 and ImageNet on several datasets. Moreover, we show that Adversarial Training reduces Fisher information of representations about the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
