How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

Cynthia Dwork; Lunjia Hu; Han Shao

arXiv:2506.16704·cs.LG·October 27, 2025

How Many Domains Suffice for Domain Generalization? A Tight Characterization via the Domain Shattering Dimension

Cynthia Dwork, Lunjia Hu, Han Shao

PDF

TL;DR

This paper introduces the domain shattering dimension, a new measure that characterizes the number of domains needed for effective domain generalization, linking it tightly to VC dimension.

Contribution

It defines the domain shattering dimension and proves it characterizes domain sample complexity, establishing a tight relationship with VC dimension.

Findings

01

Domain shattering dimension characterizes domain sample complexity.

02

Learnability in standard PAC implies learnability in domain generalization.

03

Provides a theoretical foundation for domain generalization sample requirements.

Abstract

We study a fundamental question of domain generalization: given a family of domains (i.e., data distributions), how many randomly sampled domains do we need to collect data from in order to learn a model that performs reasonably well on every seen and unseen domain in the family? We model this problem in the PAC framework and introduce a new combinatorial measure, which we call the domain shattering dimension. We show that this dimension characterizes the domain sample complexity. Furthermore, we establish a tight quantitative relationship between the domain shattering dimension and the classic VC dimension, demonstrating that every hypothesis class that is learnable in the standard PAC setting is also learnable in our setting.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.