Towards The Implicit Bias on Multiclass Separable Data Under Norm Constraints
Shengping Xie, Zekun Wu, Quan Chen, Kaixu Tang

TL;DR
This paper investigates how optimization geometry influences implicit bias in multiclass separable data, introducing NucGD, a low-rank enforcing optimizer, and analyzing stochastic effects on convergence to maximum margin solutions.
Contribution
It introduces NucGD, a geometry-aware optimizer with nuclear norm constraints, and connects it to low-rank projection methods, providing new insights into implicit bias mechanisms.
Findings
NucGD effectively enforces low-rank structures.
Stochastic noise influences convergence to maximum margin solutions.
Efficient SVD-free update rule enables scalable training.
Abstract
Implicit bias induced by gradient-based algorithms is essential to the generalization of overparameterized models, yet its mechanisms can be subtle. This work leverages the Normalized Steepest Descent} (NSD) framework to investigate how optimization geometry shapes solutions on multiclass separable data. We introduce NucGD, a geometry-aware optimizer designed to enforce low rank structures through nuclear norm constraints. Beyond the algorithm itself, we connect NucGD with emerging low-rank projection methods, providing a unified perspective. To enable scalable training, we derive an efficient SVD-free update rule via asynchronous power iteration. Furthermore, we empirically dissect the impact of stochastic optimization dynamics, characterizing how varying levels of gradient noise induced by mini-batch sampling and momentum modulate the convergence toward the expected maximum margin…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Tensor decomposition and applications · Generative Adversarial Networks and Image Synthesis
