TL;DR
This paper proves that block coordinate descent can globally minimize loss in deep neural networks with monotonic activations, extends guarantees to ReLU with modifications, and demonstrates strong empirical and theoretical results.
Contribution
It provides the first proof of global convergence for BCD in neural networks and extends guarantees to ReLU activations with a modified algorithm.
Findings
BCD converges to global minima with monotonic activations.
Loss decreases exponentially at the output layer.
Empirical results confirm theoretical convergence and effectiveness.
Abstract
In this paper, we consider a block coordinate descent (BCD) algorithm for training deep neural networks and provide a new global convergence guarantee under strictly monotonically increasing activation functions. While existing works demonstrate convergence to stationary points for BCD in neural networks, our contribution is the first to prove convergence to global minima, ensuring arbitrarily small loss. We show that the loss with respect to the output layer decreases exponentially while the loss with respect to the hidden layers remains well-controlled. Additionally, we derive generalization bounds using the Rademacher complexity framework, demonstrating that BCD not only achieves strong optimization guarantees but also provides favorable generalization performance. Moreover, we propose a modified BCD algorithm with skip connections and non-negative projection, extending our…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
