Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains
Sathya N. Ravi, Abhay Venkatesh, Glenn Moo Fung, Vikas Singh

TL;DR
This paper introduces a reparameterization and dualization technique to optimize complex, nondecomposable data-dependent regularizers efficiently, significantly improving performance and scalability in machine learning tasks.
Contribution
The authors propose a novel reparameterization and partial dualization approach that enables efficient optimization of nondecomposable regularizers with minimal code changes.
Findings
Achieves significant performance improvements on MSCOCO dataset
Provides provably cheap projection operators for the reformulated problem
Demonstrates improved scalability for large datasets
Abstract
Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We find that for most medium to large sized datasets, scalability issues severely limit our ability in leveraging the benefits of such regularizers. Importantly, the key technical impediment despite some recent progress is that, such objectives remain difficult to optimize via backpropapagation procedures. While an efficient general-purpose strategy for this problem still remains elusive, in this paper, we show that for many data-dependent nondecomposable regularizers that are relevant in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
