$k$-Variance: A Clustered Notion of Variance

Justin Solomon; Kristjan Greenewald; Haikady N. Nagaraja

arXiv:2012.06958·math.ST·December 15, 2020

$k$-Variance: A Clustered Notion of Variance

Justin Solomon, Kristjan Greenewald, Haikady N. Nagaraja

PDF

Open Access

TL;DR

This paper introduces $k$-variance, a new measure based on random bipartite matchings that captures local distributional information and can be efficiently approximated, with analysis and experiments demonstrating its properties.

Contribution

It defines $k$-variance, proves its fundamental properties, and analyzes its behavior in various distributional settings, providing a novel tool for understanding distribution shape.

Findings

01

$k$-variance effectively captures local distribution features.

02

It can be approximated efficiently via sampling and linear programming.

03

Analysis includes one-dimensional, clustered, and low-dimensional measures.

Abstract

We introduce $k$ -variance, a generalization of variance built on the machinery of random bipartite matchings. $K$ -variance measures the expected cost of matching two sets of $k$ samples from a distribution to each other, capturing local rather than global information about a measure as $k$ increases; it is easily approximated stochastically using sampling and linear programming. In addition to defining $k$ -variance and proving its basic properties, we provide in-depth analysis of this quantity in several key cases, including one-dimensional measures, clustered measures, and measures concentrated on low-dimensional subsets of $R^{n}$ . We conclude with experiments and open problems motivated by this new way to summarize distributional shape.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsBayesian Methods and Mixture Models · Data Management and Algorithms · Topological and Geometric Data Analysis