Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian   Reparameterization offers Significant Performance and Efficiency Gains

Sathya N. Ravi; Abhay Venkatesh; Glenn Moo Fung; Vikas Singh

arXiv:1909.12398·cs.CV·September 30, 2019

Optimizing Nondecomposable Data Dependent Regularizers via Lagrangian Reparameterization offers Significant Performance and Efficiency Gains

Sathya N. Ravi, Abhay Venkatesh, Glenn Moo Fung, Vikas Singh

PDF

TL;DR

This paper introduces a reparameterization and dualization technique to optimize complex, nondecomposable data-dependent regularizers efficiently, significantly improving performance and scalability in machine learning tasks.

Contribution

The authors propose a novel reparameterization and partial dualization approach that enables efficient optimization of nondecomposable regularizers with minimal code changes.

Findings

01

Achieves significant performance improvements on MSCOCO dataset

02

Provides provably cheap projection operators for the reformulated problem

03

Demonstrates improved scalability for large datasets

Abstract

Data dependent regularization is known to benefit a wide variety of problems in machine learning. Often, these regularizers cannot be easily decomposed into a sum over a finite number of terms, e.g., a sum over individual example-wise terms. The $F_{β}$ measure, Area under the ROC curve (AUCROC) and Precision at a fixed recall (P@R) are some prominent examples that are used in many applications. We find that for most medium to large sized datasets, scalability issues severely limit our ability in leveraging the benefits of such regularizers. Importantly, the key technical impediment despite some recent progress is that, such objectives remain difficult to optimize via backpropapagation procedures. While an efficient general-purpose strategy for this problem still remains elusive, in this paper, we show that for many data-dependent nondecomposable regularizers that are relevant in…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.