Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing

Nived Rajaraman; Devvrit; Aryan Mokhtari; Kannan Ramchandran

arXiv:2303.11453·cs.LG·June 6, 2023·1 cites

Greedy Pruning with Group Lasso Provably Generalizes for Matrix Sensing

Nived Rajaraman, Devvrit, Aryan Mokhtari, Kannan Ramchandran

PDF

Open Access 1 Reviews

TL;DR

This paper provides the first rigorous theoretical analysis explaining why greedy pruning combined with fine-tuning leads to smaller models that generalize well, focusing on overparameterized matrix sensing with group Lasso regularization.

Contribution

It introduces a provable framework showing how pruning and fine-tuning with regularization results in minimal, well-generalized models in matrix sensing.

Findings

01

Pruning below a certain norm threshold yields a minimal model close to ground truth.

02

Gradient descent from pruned models converges linearly to a good solution.

03

Regularization is crucial for effective greedy pruning and generalization.

Abstract

Pruning schemes have been widely used in practice to reduce the complexity of trained models with a massive number of parameters. In fact, several practical studies have shown that if a pruned model is fine-tuned with some gradient-based updates it generalizes well to new samples. Although the above pipeline, which we refer to as pruning + fine-tuning, has been extremely successful in lowering the complexity of trained models, there is very little known about the theory behind this success. In this paper, we address this issue by investigating the pruning + fine-tuning framework on the overparameterized matrix sensing problem with the ground truth $U_{⋆} \in R^{d \times r}$ and the overparameterized model $U \in R^{d \times k}$ with $k ≫ r$ . We study the approximate local minima of the mean square error, augmented with a smooth version of a group Lasso regularizer,…

Peer Reviews

Decision·NeurIPS 2023 poster

Reviewer 01Rating 7· Accept: Technically solid paper, with high impact on at least one sub-area, or moderate-to-high impact on more than one areas, with good-to-excellent evaluation, resources, reproducibility, and no unaddressed ethical considerations.Confidence 3

Strengths

The paper is very well written and easy to follow. The problem considered (noisy overparametrized noisy matrix sensing) is relevant by itself and additionally can provide insights for more complicated learning models which can be of great interest to the community. The results ar Pruning as a technique to solve this specific problem is very well motivated based on both Theorem 1 and achievable statistical precision in the overparametrized setting (c.f. line 312).

Weaknesses

In Theorem 3, the regularizer weight and the thresholding for pruning are both explicitly dependent on the target rank r. Thus, knowledge of r is required to a degree anyway. Consequently, the comparison made to the overparametrized setting from [23] does not seem entirely fair.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Geophysical and Geoelectrical Methods

MethodsPruning