Scalable Feature Matching Across Large Data Collections
David Degras

TL;DR
This paper introduces fast, scalable algorithms for feature vector matching across large datasets by formulating the problem as a multidimensional assignment with decomposable costs, enabling efficient large-scale applications.
Contribution
The paper develops the first algorithms with linear time complexity and low storage for multidimensional feature matching, applicable to large datasets using squared Euclidean distance.
Findings
Algorithms outperform existing methods in speed and accuracy.
Linear scaling enables handling large datasets efficiently.
Successful application to a large neuroimaging database.
Abstract
This paper is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop extremely fast algorithms with time complexity linear in the number of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce matching problems between pairs of datasets to problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference
