Lexicographic and Depth-Sensitive Margins in Homogeneous and   Non-Homogeneous Deep Models

Mor Shpigel Nacson; Suriya Gunasekar; Jason D. Lee; Nathan Srebro,; Daniel Soudry

arXiv:1905.07325·stat.ML·May 20, 2019·25 cites

Lexicographic and Depth-Sensitive Margins in Homogeneous and Non-Homogeneous Deep Models

Mor Shpigel Nacson, Suriya Gunasekar, Jason D. Lee, Nathan Srebro,, Daniel Soudry

PDF

Open Access

TL;DR

This paper investigates how infinitesimal regularization and gradient descent lead to margin-maximizing solutions in deep models, revealing differences between homogeneous and non-homogeneous architectures and their convergence properties.

Contribution

It extends previous work by analyzing both homogeneous and non-homogeneous deep models, characterizing their margin-maximizing solutions and the conditions for convergence under gradient descent.

Findings

01

Non-homogeneous models discard unnecessary shallow sub-models.

02

Homogeneous models converge to lexicographic max-margin solutions.

03

Conditions identified for max-margin solutions via gradient descent.

Abstract

With an eye toward understanding complexity control in deep learning, we study how infinitesimal regularization or gradient descent optimization lead to margin maximizing solutions in both homogeneous and non-homogeneous models, extending previous work that focused on infinitesimal regularization only in homogeneous models. To this end we study the limit of loss minimization with a diverging norm constraint (the "constrained path"), relate it to the limit of a "margin path" and characterize the resulting solution. For non-homogeneous ensemble models, which output is a sum of homogeneous sub-models, we show that this solution discards the shallowest sub-models if they are unnecessary. For homogeneous models, we show convergence to a "lexicographic max-margin solution", and provide conditions under which max-margin solutions are also attained as the limit of unconstrained gradient descent.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsStochastic Gradient Optimization Techniques · Sparse and Compressive Sensing Techniques · Machine Learning and Algorithms