
TL;DR
This paper provides a theoretical analysis of the implicit regularization effects in Deep Linear Discriminant Analysis, revealing how the optimization process induces a specific quasi-norm conservation.
Contribution
It introduces the first theoretical exploration of the implicit bias in Deep LDA, showing how gradient flow transforms additive updates into multiplicative ones with quasi-norm preservation.
Findings
Gradient flow in Deep LDA induces multiplicative weight updates.
Under balanced initialization, the network conserves a (2/L) quasi-norm.
The analysis reveals the geometric effects of the discriminative metric-learning objective.
Abstract
While the Implicit Bias(or Implicit Regularization) of standard loss functions has been studied, the optimization geometry induced by discriminative metric-learning objectives remains largely unexplored.To the best of our knowledge, this paper presents an initial theoretical analysis of the implicit regularization induced by the Deep LDA,a scale invariant objective designed to minimize intraclass variance and maximize interclass distance. By analyzing the gradient flow of the loss on a L-layer diagonal linear network, we prove that under balanced initialization, the network architecture transforms standard additive gradient updates into multiplicative weight updates, which demonstrates an automatic conservation of the (2/L) quasi-norm.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
