
TL;DR
This paper presents a theoretical analysis of the generalized min-max (GMM) kernel, demonstrating its properties, consistency, and advantages over cosine similarity, especially for data following elliptical distributions including t-distributions.
Contribution
It provides the first theoretical limits, consistency, and asymptotic normality results for GMM, establishing its robustness and efficiency in machine learning applications.
Findings
GMM has a theoretical limit and proven consistency under elliptical distributions.
GMM outperforms cosine similarity for data with low degrees of freedom in t-distributions.
GMM can be efficiently computed and used for near neighbor search.
Abstract
We develop some theoretical results for a robust similarity measure named "generalized min-max" (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via probabilistic hashing. Owing to the discrete nature, the hashed values can also be used for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a very general family of distributions and includes the multivariate -distribution as a special case. The consistency result holds as long as the data have bounded first moment (an assumption which essentially holds for datasets commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. Compared to the "cosine" similarity which is routinely adopted in current practice in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsStatistical Mechanics and Entropy · Face and Expression Recognition · Bayesian Methods and Mixture Models
