On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

Guy Smorodinsky; Sveta Gimpleson; Itay Safran

arXiv:2603.02095·cs.LG·March 3, 2026

On the Rate of Convergence of GD in Non-linear Neural Networks: An Adversarial Robustness Perspective

Guy Smorodinsky, Sveta Gimpleson, Itay Safran

PDF

Open Access

TL;DR

This paper analyzes the convergence rate of Gradient Descent in a simple non-linear neural network, revealing a slow logarithmic rate for robustness margin maximization, supported by theoretical proofs and empirical validation.

Contribution

It provides the first explicit lower bound on the convergence rate of the robustness margin in a non-linear neural network setting.

Findings

01

GD converges to the robustness margin at a rate of Θ(1/ln(t))

02

The slow convergence rate is consistent across different initializations

03

The analysis applies to a minimal two-neuron ReLU network with two training points

Abstract

We study the convergence dynamics of Gradient Descent (GD) in a minimal binary classification setting, consisting of a two-neuron ReLU network and two training instances. We prove that even under these strong simplifying assumptions, while GD successfully converges to an optimal robustness margin, effectively maximizing the distance between the decision boundary and the training points, this convergence occurs at a prohibitively slow rate, scaling strictly as $Θ (1/ ln (t))$ . To the best of our knowledge, this establishes the first explicit lower bound on the convergence rate of the robustness margin in a non-linear model. Through empirical simulations, we further demonstrate that this inherent failure mode is pervasive, exhibiting the exact same tight convergence rate across multiple natural network initializations. Our theoretical guarantees are derived via a rigorous analysis of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Adversarial Robustness in Machine Learning · Advanced Neural Network Applications