TL;DR
This paper introduces dependent MMD coresets, a novel data summarization method that enables effective comparison of multiple related datasets and provides insights into dataset differences and model generalization.
Contribution
The paper proposes dependent MMD coresets, a new approach for summarizing and comparing collections of related datasets to better understand their differences and implications for model performance.
Findings
Dependent MMD coresets facilitate dataset comparison.
They help identify under-represented sub-populations.
The method improves understanding of model generalization.
Abstract
Understanding how two datasets differ can help us determine whether one dataset under-represents certain sub-populations, and provides insights into how well models will generalize across datasets. Representative points selected by a maximum mean discrepency (MMD) coreset can provide interpretable summaries of a single dataset, but are not easily compared across datasets. In this paper we introduce dependent MMD coresets, a data summarization method for collections of datasets that facilitates comparison of distributions. We show that dependent MMD coresets are useful for understanding multiple related datasets and understanding model generalization between such datasets.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
