Optimization with Access to Auxiliary Information
El Mahdi Chayti, Sai Praneeth Karimireddy

TL;DR
This paper introduces algorithms for optimization problems where auxiliary information with cheap gradients is used to improve the efficiency of minimizing a target function with costly gradients, applicable in various machine learning settings.
Contribution
The paper proposes two generic algorithms leveraging auxiliary information and proves their effectiveness under Hessian similarity assumptions.
Findings
Algorithms outperform traditional methods when Hessian similarity is high.
Potential benefits from auxiliary noise correlation in stochastic settings.
Framework applicable to multiple practical scenarios like federated learning and transfer learning.
Abstract
We investigate the fundamental optimization question of minimizing a target function , whose gradients are expensive to compute or have limited availability, given access to some auxiliary side function whose gradients are cheap or more available. This formulation captures many settings of practical relevance, such as i) re-using batches in SGD, ii) transfer learning, iii) federated learning, iv) training with compressed models/dropout, Et cetera. We propose two generic new algorithms that apply in all these settings; we also prove that we can benefit from this framework under the Hessian similarity assumption between the target and side information. A benefit is obtained when this similarity measure is small; we also show a potential benefit from stochasticity when the auxiliary noise is correlated with that of the target function.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsStochastic Gradient Optimization Techniques · Privacy-Preserving Technologies in Data · Domain Adaptation and Few-Shot Learning
MethodsStochastic Gradient Descent
