A new class of metrics for learning on real-valued and structured data
Ruiyu Yang, Yuxiang Jiang, Scott Mathews, Elizabeth A. Housworth,, Matthew W. Hahn, Predrag Radivojac

TL;DR
This paper introduces a new class of metrics applicable to sets, vectors, functions, and structured data, unifying and extending existing metrics, with proven properties and demonstrated effectiveness in data mining tasks.
Contribution
The paper presents a novel, unified class of metrics that generalize popular distances and are suitable for high-dimensional and structured data analysis.
Findings
New metrics are complete and relate to $f$-divergences.
Empirical results show improved performance over traditional metrics.
Metrics are effective for high-dimensional and structured data processing.
Abstract
We propose a new class of metrics on sets, vectors, and functions that can be used in various stages of data mining, including exploratory data analysis, learning, and result interpretation. These new distance functions unify and generalize some of the popular metrics, such as the Jaccard and bag distances on sets, Manhattan distance on vector spaces, and Marczewski-Steinhaus distance on integrable functions. We prove that the new metrics are complete and show useful relationships with -divergences for probability distributions. To further extend our approach to structured objects such as concept hierarchies and ontologies, we introduce information-theoretic metrics on directed acyclic graphs drawn according to a fixed probability distribution. We conduct empirical investigation to demonstrate intuitive interpretation of the new metrics and their effectiveness on real-valued,…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Artificial Immune Systems Applications · Advanced Clustering Algorithms Research
