Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time
Zhao Song, Lichen Zhang, Ruizhe Zhang

TL;DR
This paper introduces a method to train multi-layer over-parametrized neural networks with subquadratic time complexity per iteration, significantly reducing training costs for large models.
Contribution
It presents a framework that reduces per-iteration training time from quadratic to subquadratic in the model size, applicable to large over-parametrized neural networks.
Findings
Achieves subquadratic training iteration cost in model size m
Reduces training time from O(m^2) to m^{2 - Ω(1)} per iteration
Applicable to fine-tuning large language models efficiently
Abstract
We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width is much larger than the data dimension and the number of training samples (), which induces a prohibitive large weight matrix per layer. Naively, one has to pay time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses cost only in the initialization phase and achieves \emph{a truly subquadratic cost per iteration} in terms of , i.e., per iteration. Our result has implications beyond standard over-parametrization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
