Piecewise convexity of artificial neural networks
Blaine Rister, Daniel L Rubin

TL;DR
This paper establishes theoretical properties of neural networks with piecewise affine activations, showing they are piecewise convex and multi-convex in inputs and parameters, and analyzes optimization methods for training such networks.
Contribution
It proves the piecewise convexity and multi-convexity of neural networks with ReLU activations and characterizes their local minima and stationary points.
Findings
Networks are piecewise convex in input data.
Networks are piecewise multi-convex in parameters.
Gradient descent and convex sub-problem methods have specific convergence conditions.
Abstract
Although artificial neural networks have shown great promise in applications including computer vision and speech recognition, there remains considerable practical and theoretical difficulty in optimizing their parameters. The seemingly unreasonable success of gradient descent methods in minimizing these non-convex functions remains poorly understood. In this work we offer some theoretical guarantees for networks with piecewise affine activation functions, which have in recent years become the norm. We prove three main results. Firstly, that the network is piecewise convex as a function of the input data. Secondly, that the network, considered as a function of the parameters in a single layer, all others held constant, is again piecewise convex. Finally, that the network as a function of all its parameters is piecewise multi-convex, a generalization of biconvexity. From here we…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
