Theoretical Analysis of Robust Overfitting for Wide DNNs: An NTK Approach
Shaopeng Fu, Di Wang

TL;DR
This paper provides a theoretical analysis of robust overfitting in adversarial training of wide DNNs using NTK theory, revealing a degeneration phenomenon and proposing a new AT algorithm for infinite-width networks.
Contribution
It extends NTK theory to adversarial training, explains robust overfitting phenomena, and introduces Adv-NTK, a novel AT method for infinite-width DNNs.
Findings
Long-term AT causes wide DNNs to degenerate to non-robust models.
Adv-NTK improves robustness of infinite-width DNNs.
Theoretical analysis aligns with empirical results on real datasets.
Abstract
Adversarial training (AT) is a canonical method for enhancing the robustness of deep neural networks (DNNs). However, recent studies empirically demonstrated that it suffers from robust overfitting, i.e., a long time AT can be detrimental to the robustness of DNNs. This paper presents a theoretical explanation of robust overfitting for DNNs. Specifically, we non-trivially extend the neural tangent kernel (NTK) theory to AT and prove that an adversarially trained wide DNN can be well approximated by a linearized DNN. Moreover, for squared loss, closed-form AT dynamics for the linearized DNN can be derived, which reveals a new AT degeneration phenomenon: a long-term AT will result in a wide DNN degenerates to that obtained without AT and thus cause robust overfitting. Based on our theoretical results, we further design a method namely Adv-NTK, the first AT algorithm for infinite-width…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Anomaly Detection Techniques and Applications · Model Reduction and Neural Networks
