Scalable Feature Matching Across Large Data Collections

David Degras

arXiv:2101.02035·stat.CO·January 7, 2021·J. Comput. Graph. Stat.

Scalable Feature Matching Across Large Data Collections

David Degras

PDF

Open Access 1 Repo

TL;DR

This paper introduces fast, scalable algorithms for feature vector matching across large datasets by formulating the problem as a multidimensional assignment with decomposable costs, enabling efficient large-scale applications.

Contribution

The paper develops the first algorithms with linear time complexity and low storage for multidimensional feature matching, applicable to large datasets using squared Euclidean distance.

Findings

01

Algorithms outperform existing methods in speed and accuracy.

02

Linear scaling enables handling large datasets efficiently.

03

Successful application to a large neuroimaging database.

Abstract

This paper is concerned with matching feature vectors in a one-to-one fashion across large collections of datasets. Formulating this task as a multidimensional assignment problem with decomposable costs (MDADC), we develop extremely fast algorithms with time complexity linear in the number $n$ of datasets and space complexity a small fraction of the data size. These remarkable properties hinge on using the squared Euclidean distance as dissimilarity function, which can reduce $(2 n)$ matching problems between pairs of datasets to $n$ problems and enable calculating assignment costs on the fly. To our knowledge, no other method applicable to the MDADC possesses these linear scaling and low-storage properties necessary to large-scale applications. In numerical experiments, the novel algorithms outperform competing methods and show excellent computational and optimization…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

ddegras/matchFeat
noneOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsMachine Learning and Algorithms · Stochastic Gradient Optimization Techniques · Statistical Methods and Inference