Convex Formulations for Training Two-Layer ReLU Neural Networks

Karthik Prakhya; Tolga Birdal; Alp Yurtsever

arXiv:2410.22311·cs.LG·March 18, 2025

Convex Formulations for Training Two-Layer ReLU Neural Networks

Karthik Prakhya, Tolga Birdal, Alp Yurtsever

PDF

Open Access 1 Repo 3 Reviews

TL;DR

This paper introduces a convex reformulation of training infinite-width two-layer ReLU neural networks, enabling polynomial-time approximations and demonstrating competitive test accuracy in classification tasks.

Contribution

It presents the first convex formulation for training two-layer ReLU networks and proposes a semidefinite relaxation to make the problem computationally feasible.

Findings

01

Semidefinite relaxation is effective and can be solved in polynomial time.

02

The relaxation achieves competitive test accuracy on various classification tasks.

03

The convex formulation provides insights into neural network training dynamics.

Abstract

Solving non-convex, NP-hard optimization problems is crucial for training machine learning models, including neural networks. However, non-convexity often leads to black-box machine learning models with unclear inner workings. While convex formulations have been used for verifying neural network robustness, their application to training neural networks remains less explored. In response to this challenge, we reformulate the problem of training infinite-width two-layer ReLU networks as a convex completely positive program in a finite-dimensional (lifted) space. Despite the convexity, solving this problem remains NP-hard due to the complete positivity constraint. To overcome this challenge, we introduce a semidefinite relaxation that can be solved in polynomial time. We then experimentally evaluate the tightness of this relaxation, demonstrating its competitive performance in test…

Peer Reviews

Decision·ICLR 2025 Poster

Reviewer 01Rating 8Confidence 2

Strengths

This is an interesting paper that derives an equivalence between infinite-width RELU network training and solving a certain convex copositive program. This work contributes to the growing literature relating neural network training and convex optimization. The empirical results are also promising. Overall, I think this is an interesting work that provides valuable insight.

Weaknesses

The empirical results seem somewhat weak to me. The authors acknowledge the similarity of their work to earlier work relating copositive programming and RELU network training. Although the existing methods make additional assumptions on the data distribution, a numerical comparison to prior work (e.g. the approximation ratio) would be beneficial in understanding how the proposed framework compares to earlier work in practice or further discussion on the applications of their technique.

Reviewer 02Rating 6Confidence 3

Strengths

The paper is clear, and reads well. The supplementary material provides the code to reproduce the results. Section 2 provides a complete yet brief background on the relevant optimization topics and concepts.

Weaknesses

The paper does not consider bias terms in the linear layers. The tightness of the proposed relaxation is evaluated only empirically. There is no convergence guarantee for the TOS rounding step. The rounding step is performed using the critical width, however in practice this is unfeasible. Time complexity is not considered in the evaluations. The empirical evaluation is performed on classification tasks, using L2 loss.

Reviewer 03Rating 6Confidence 3

Strengths

- The idea of the proposed method based on convex completely positive program and semidefinite relaxation is interesting. - The presentation of the paper is fairly clear.

Weaknesses

- The proposed method is only for training wide two-layer neural networks with ReLU activation function. It seems that extending the method to deeper networks is non-trivial, which limits the practical value of the proposed method. It would be beneficial if the authors could discuss the possibility to implement the proposed methods to train modern deep neural networks. - As the authors have commented, the problem (Cp-Nn) is NP-hard due to the complete positivity constraint, and as far as I can

Code & Models

Repositories

KarthikPrakhya/SDPNN-IW
jaxOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsNeural Networks and Applications · Machine Learning and ELM · Brain Tumor Detection and Classification

Methods*Communicated@Fast*How Do I Communicate to Expedia?