Why Robust Generalization in Deep Learning is Difficult: Perspective of Expressive Power
Binghui Li, Jikai Jin, Han Zhong, John E. Hopcroft, Liwei Wang

TL;DR
This paper investigates why deep neural networks struggle with robust generalization, revealing that achieving low robust error requires exponentially large models relative to data dimension, due to expressive power limitations.
Contribution
It provides a theoretical analysis showing the exponential size gap needed for robust generalization in neural networks based on data complexity and model expressive power.
Findings
Robust training error can be near zero with mild over-parameterization.
Achieving low robust generalization error requires exponential network size in data dimension.
The curse of dimensionality affects robust generalization even on low-dimensional manifolds.
Abstract
It is well-known that modern neural networks are vulnerable to adversarial examples. To mitigate this problem, a series of robust learning algorithms have been proposed. However, although the robust training error can be near zero via some methods, all existing algorithms lead to a high robust generalization error. In this paper, we provide a theoretical understanding of this puzzling phenomenon from the perspective of expressive power for deep neural networks. Specifically, for binary classification problems with well-separated data, we show that, for ReLU networks, while mild over-parameterization is sufficient for high robust training accuracy, there exists a constant robust generalization gap unless the size of the neural network is exponential in the data dimension . This result holds even if the data is linear separable (which means achieving standard generalization is easy),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
Taxonomy
TopicsMachine Learning and Algorithms · Face and Expression Recognition · Stochastic Gradient Optimization Techniques
