$k$-Variance: A Clustered Notion of Variance
Justin Solomon, Kristjan Greenewald, Haikady N. Nagaraja

TL;DR
This paper introduces $k$-variance, a new measure based on random bipartite matchings that captures local distributional information and can be efficiently approximated, with analysis and experiments demonstrating its properties.
Contribution
It defines $k$-variance, proves its fundamental properties, and analyzes its behavior in various distributional settings, providing a novel tool for understanding distribution shape.
Findings
$k$-variance effectively captures local distribution features.
It can be approximated efficiently via sampling and linear programming.
Analysis includes one-dimensional, clustered, and low-dimensional measures.
Abstract
We introduce -variance, a generalization of variance built on the machinery of random bipartite matchings. -variance measures the expected cost of matching two sets of samples from a distribution to each other, capturing local rather than global information about a measure as increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining -variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of . We conclude with experiments and open problems motivated by this new way to summarize distributional shape.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsBayesian Methods and Mixture Models · Data Management and Algorithms · Topological and Geometric Data Analysis
