Training Multi-Layer Over-Parametrized Neural Network in Subquadratic   Time

Zhao Song; Lichen Zhang; Ruizhe Zhang

arXiv:2112.07628·cs.LG·November 27, 2023

Training Multi-Layer Over-Parametrized Neural Network in Subquadratic Time

Zhao Song, Lichen Zhang, Ruizhe Zhang

PDF

TL;DR

This paper introduces a method to train multi-layer over-parametrized neural networks with subquadratic time complexity per iteration, significantly reducing training costs for large models.

Contribution

It presents a framework that reduces per-iteration training time from quadratic to subquadratic in the model size, applicable to large over-parametrized neural networks.

Findings

01

Achieves subquadratic training iteration cost in model size m

02

Reduces training time from O(m^2) to m^{2 - Ω(1)} per iteration

03

Applicable to fine-tuning large language models efficiently

Abstract

We consider the problem of training a multi-layer over-parametrized neural network to minimize the empirical risk induced by a loss function. In the typical setting of over-parametrization, the network width $m$ is much larger than the data dimension $d$ and the number of training samples $n$ ( $m = poly (n, d)$ ), which induces a prohibitive large weight matrix $W \in R^{m \times m}$ per layer. Naively, one has to pay $O (m^{2})$ time to read the weight matrix and evaluate the neural network function in both forward and backward computation. In this work, we show how to reduce the training cost per iteration. Specifically, we propose a framework that uses $m^{2}$ cost only in the initialization phase and achieves \emph{a truly subquadratic cost per iteration} in terms of $m$ , i.e., $m^{2 - Ω (1)}$ per iteration. Our result has implications beyond standard over-parametrization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.