Implicit Regularization Towards Rank Minimization in ReLU Networks

Nadav Timor; Gal Vardi; Ohad Shamir

arXiv:2201.12760·cs.LG·December 24, 2024·5 cites

Implicit Regularization Towards Rank Minimization in ReLU Networks

Nadav Timor, Gal Vardi, Ohad Shamir

PDF

Open Access

TL;DR

This paper investigates the relationship between implicit regularization in ReLU neural networks and rank minimization, revealing both limitations and conditions under which low-rank solutions are favored.

Contribution

It provides new theoretical and empirical insights into when gradient flow in ReLU networks promotes low-rank solutions, extending understanding beyond linear models.

Findings

01

Gradient flow may not minimize rank in ReLU networks for most datasets.

02

Deep ReLU networks are biased towards low-rank solutions under certain conditions.

03

Empirical evidence supports the theoretical results.

Abstract

We study the conjectured relationship between the implicit regularization in neural networks, trained with gradient-based methods, and rank minimization of their weight matrices. Previously, it was proved that for linear networks (of depth 2 and vector-valued outputs), gradient flow (GF) w.r.t. the square loss acts as a rank minimization heuristic. However, understanding to what extent this generalizes to nonlinear networks is an open problem. In this paper, we focus on nonlinear ReLU networks, providing several new positive and negative results. On the negative side, we prove (and demonstrate empirically) that, unlike the linear case, GF on ReLU networks may no longer tend to minimize ranks, in a rather strong sense (even approximately, for "most" datasets of size 2). On the positive side, we reveal that ReLU networks of sufficient depth are provably biased towards low-rank solutions…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Machine Learning and ELM · Sparse and Compressive Sensing Techniques