Efficient Distance Metric Learning by Adaptive Sampling and Mini-Batch Stochastic Gradient Descent (SGD)
Qi Qian, Rong Jin, Jinfeng Yi, Lijun Zhang, Shenghuo Zhu

TL;DR
This paper introduces adaptive sampling and mini-batch strategies within stochastic gradient descent to significantly reduce the computational cost of distance metric learning, especially the expensive PSD projections.
Contribution
It develops novel SGD-based algorithms with theoretical guarantees that effectively lower the number of PSD projections needed in DML.
Findings
Reduced number of PSD projections in DML algorithms
Theoretical guarantees for adaptive sampling and mini-batch methods
Empirical validation showing improved efficiency and comparable accuracy
Abstract
Distance metric learning (DML) is an important task that has found applications in many domains. The high computational cost of DML arises from the large number of variables to be determined and the constraint that a distance metric has to be a positive semi-definite (PSD) matrix. Although stochastic gradient descent (SGD) has been successfully applied to improve the efficiency of DML, it can still be computationally expensive because in order to ensure that the solution is a PSD matrix, it has to, at every iteration, project the updated distance metric onto the PSD cone, an expensive operation. We address this challenge by developing two strategies within SGD, i.e. mini-batch and adaptive sampling, to effectively reduce the number of updates (i.e., projections onto the PSD cone) in SGD. We also develop hybrid approaches that combine the strength of adaptive sampling with that of…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace and Expression Recognition · Domain Adaptation and Few-Shot Learning · Machine Learning and ELM
MethodsStochastic Gradient Descent
