Efficient Global Optimization of Two-Layer ReLU Networks: Quadratic-Time Algorithms and Adversarial Training

Yatong Bai; Tanmay Gautam; Somayeh Sojoudi

arXiv:2201.01965·cs.LG·June 18, 2025

Efficient Global Optimization of Two-Layer ReLU Networks: Quadratic-Time Algorithms and Adversarial Training

Yatong Bai, Tanmay Gautam, Somayeh Sojoudi

PDF

Open Access

TL;DR

This paper introduces two quadratic-time algorithms with global convergence guarantees for training two-layer ReLU neural networks, including robust adversarial training formulations, addressing non-convexity issues.

Contribution

It develops efficient convex optimization algorithms for globally training two-layer ReLU networks, including adversarially robust models, with theoretical convergence guarantees.

Findings

01

Algorithms achieve linear and quadratic convergence rates.

02

High prediction accuracy achieved in initial iterations.

03

Robust convex formulations enable adversarially resilient training.

Abstract

The non-convexity of the artificial neural network (ANN) training landscape brings inherent optimization difficulties. While the traditional back-propagation stochastic gradient descent (SGD) algorithm and its variants are effective in certain cases, they can become stuck at spurious local minima and are sensitive to initializations and hyperparameters. Recent work has shown that the training of an ANN with ReLU activations can be reformulated as a convex program, bringing hope to globally optimizing interpretable ANNs. However, naively solving the convex training formulation has an exponential complexity, and even an approximation heuristic requires cubic time. In this work, we characterize the quality of this approximation and develop two efficient algorithms that train ANNs with global convergence guarantees. The first algorithm is based on the alternating direction method of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Neural Networks and Applications · Machine Learning and ELM