Concentration Inequalities for Random Sets
Erel Segal-Halevi, Avinatan Hassidim

TL;DR
This paper introduces new concentration inequalities for random sets, utilizing the UI dimension to provide high-probability bounds on imbalance in sub-populations, with connections to VC dimension and Rademacher complexity.
Contribution
It defines the UI dimension as a novel measure for set-family richness and derives upper bounds on imbalance that depend on this measure and sub-population size.
Findings
UI dimension effectively bounds imbalance probabilities
Upper bounds depend only on sub-population size and UI dimension
Results relate to VC dimension and Rademacher complexity in machine learning
Abstract
In a large, possibly infinite population, each subject is colored red with probability , independently of the others. Then, a finite sub-population is selected, possibly as a function of the coloring. The imbalance in the sub-population is defined as the difference between the number of reds in it and p times its size. This paper presents high-probability upper bounds (tail-bounds) on this imbalance. To present the upper bounds we define the *UI dimension* --- a new measure for the richness of a set-family. We present three simple rules for upper-bounding the UI dimension of a set-family. Our upper bounds on the imbalance in a sub-population depend only on the size of the sub-population and on the UI dimension of its support. We relate our results to known concepts from machine learning, particularly the VC dimension and Rademacher complexity.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMachine Learning and Algorithms · Computability, Logic, AI Algorithms · Limits and Structures in Graph Theory
