Harnessing Synthetic Datasets: The Role of Shape Bias in Deep Neural Network Generalization
Elior Benarous, Sotiris Anagnostidis, Luca Biggio, Thomas Hofmann

TL;DR
This paper examines how shape bias in neural networks trained on synthetic datasets relates to dataset quality and generalization, revealing its limitations and proposing a new perspective for dataset diversity estimation.
Contribution
It critically analyzes shape bias as a predictor of generalization, highlighting its variability and proposing its use for estimating dataset diversity.
Findings
Shape bias varies across architectures and supervision types.
Shape bias alone is unreliable for estimating generalization.
Shape bias can be used to estimate dataset diversity.
Abstract
Recent advancements in deep learning have been primarily driven by the use of large models trained on increasingly vast datasets. While neural scaling laws have emerged to predict network performance given a specific level of computational resources, the growing demand for expansive datasets raises concerns. To address this, a new research direction has emerged, focusing on the creation of synthetic data as a substitute. In this study, we investigate how neural networks exhibit shape bias during training on synthetic datasets, serving as an indicator of the synthetic data quality. Specifically, our findings indicate three key points: (1) Shape bias varies across network architectures and types of supervision, casting doubt on its reliability as a predictor for generalization and its ability to explain differences in model recognition compared to human capabilities. (2) Relying solely on…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsExplainable Artificial Intelligence (XAI) · Domain Adaptation and Few-Shot Learning · Adversarial Robustness in Machine Learning
