CIFAR-10-Warehouse: Broad and More Realistic Testbeds in Model Generalization Analysis
Xiaoxiao Sun, Xingjian Leng, Zijian Wang, Yang Yang, Zi Huang, Liang, Zheng

TL;DR
CIFAR-10-Warehouse provides a broad, realistic testbed with diverse out-of-distribution datasets to improve evaluation of model generalization and accuracy prediction in machine learning.
Contribution
The paper introduces CIFAR-10-Warehouse, a large collection of diverse datasets for better out-of-distribution testing and analysis of model generalization.
Findings
CIFAR-10-W offers extensive insights into domain generalization.
Benchmarking reveals new challenges and opportunities in out-of-distribution robustness.
Diverse datasets improve understanding of model performance in real-world scenarios.
Abstract
Analyzing model performance in various unseen environments is a critical research problem in the machine learning community. To study this problem, it is important to construct a testbed with out-of-distribution test sets that have broad coverage of environmental discrepancies. However, existing testbeds typically either have a small number of domains or are synthesized by image corruptions, hindering algorithm design that demonstrates real-world effectiveness. In this paper, we introduce CIFAR-10-Warehouse, consisting of 180 datasets collected by prompting image search engines and diffusion models in various ways. Generally sized between 300 and 8,000 images, the datasets contain natural images, cartoons, certain colors, or objects that do not naturally appear. With CIFAR-10-W, we aim to enhance the evaluation and deepen the understanding of two generalization tasks: domain…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · Topic Modeling · Machine Learning and Data Classification
MethodsDiffusion
