Why the Maximum Second Derivative of Activations Matters for Adversarial Robustness
Yunrui Yu, Hang Su, Jun Zhu

TL;DR
This paper explores how the curvature of activation functions, measured by their maximum second derivative, influences adversarial robustness, revealing an optimal curvature range that balances expressivity and stability across models and datasets.
Contribution
It introduces the Recursive Curvature-Tunable Activation Family (RCT-AF) for precise curvature control and demonstrates the existence of an optimal curvature range for adversarial robustness.
Findings
Optimal robustness occurs when max|σ''| is between 4 and 10.
Normalized Hessian diagonal norm has a U-shaped dependence on max|σ''|.
Activation curvature impacts the Hessian, affecting model robustness.
Abstract
This work investigates the critical role of activation function curvature -- quantified by the maximum second derivative -- in adversarial robustness. Using the Recursive Curvature-Tunable Activation Family (RCT-AF), which enables precise control over curvature through parameters and , we systematically analyze this relationship. Our study reveals a fundamental trade-off: insufficient curvature limits model expressivity, while excessive curvature amplifies the normalized Hessian diagonal norm of the loss, leading to sharper minima that hinder robust generalization. This results in a non-monotonic relationship where optimal adversarial robustness consistently occurs when falls within 4 to 10, a finding that holds across diverse network architectures, datasets, and adversarial training methods. We provide theoretical insights into how…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · Explainable Artificial Intelligence (XAI) · Stochastic Gradient Optimization Techniques
