The Computational Complexity of ReLU Network Training Parameterized by   Data Dimensionality

Vincent Froese; Christoph Hertrich; Rolf Niedermeier

arXiv:2105.08675·cs.LG·August 24, 2022

The Computational Complexity of ReLU Network Training Parameterized by Data Dimensionality

Vincent Froese, Christoph Hertrich, Rolf Niedermeier

PDF

TL;DR

This paper investigates the computational complexity of training two-layer ReLU neural networks, showing that training remains hard with increasing data dimension and extending known algorithms to broader loss functions.

Contribution

It provides W[1]-hardness lower bounds for training complexity based on data dimension and extends polynomial-time algorithms to more general loss functions.

Findings

01

Training complexity is W[1]-hard with respect to data dimension.

02

Known brute-force strategies are essentially optimal under ETH.

03

Extended polynomial-time algorithms to broader loss functions.

Abstract

Understanding the computational complexity of training simple neural networks with rectified linear units (ReLUs) has recently been a subject of intensive research. Closing gaps and complementing results from the literature, we present several results on the parameterized complexity of training two-layer ReLU networks with respect to various loss functions. After a brief discussion of other parameters, we focus on analyzing the influence of the dimension $d$ of the training data on the computational complexity. We provide running time lower bounds in terms of W[1]-hardness for parameter $d$ and prove that known brute-force strategies are essentially optimal (assuming the Exponential Time Hypothesis). In comparison with previous work, our results hold for a broad(er) range of loss functions, including $ℓ^{p}$ -loss for all $p \in [0, \infty]$ . In particular, we extend a known…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.