TL;DR
This paper introduces an efficient similarity learning method for high-dimensional sparse data using a Frank-Wolfe algorithm, achieving good generalization and scalability independent of data dimension.
Contribution
It proposes a novel similarity learning approach with a greedy Frank-Wolfe algorithm that controls feature selection and provides theoretical generalization guarantees.
Findings
Algorithm's convergence rate is independent of data dimension.
Method outperforms competitors on datasets with up to one million features.
Generalization error depends logarithmically on sparsity, not feature count.
Abstract
Similarity and metric learning provides a principled approach to construct a task-specific similarity from weakly supervised data. However, these methods are subject to the curse of dimensionality: as the number of features grows large, poor generalization is to be expected and training becomes intractable due to high computational and memory costs. In this paper, we propose a similarity learning method that can efficiently deal with high-dimensional sparse data. This is achieved through a parameterization of similarity functions by convex combinations of sparse rank-one matrices, together with the use of a greedy approximate Frank-Wolfe algorithm which provides an efficient way to control the number of active features. We show that the convergence rate of the algorithm, as well as its time and memory complexity, are independent of the data dimension. We further provide a theoretical…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
