The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images
Nicholas Konz, Maciej A. Mazurowski

TL;DR
This paper explores how intrinsic properties of datasets, like intrinsic dimension and label sharpness, affect neural network generalization, robustness, and learning differences between natural and medical images, supported by theoretical and empirical analysis.
Contribution
It introduces a generalization scaling law based on intrinsic dataset properties and links label sharpness to adversarial vulnerability, extending the analysis to learned representations.
Findings
Generalization error scales with intrinsic data dimension.
Medical images exhibit higher label sharpness, affecting learning.
Higher label sharpness correlates with increased adversarial vulnerability.
Abstract
This paper investigates discrepancies in how neural networks learn from different imaging domains, which are commonly overlooked when adopting computer vision techniques from the domain of natural images to other specialized domains such as medical images. Recent works have found that the generalization error of a trained network typically increases with the intrinsic dimension () of its training set. Yet, the steepness of this relationship varies significantly between medical (radiological) and natural imaging domains, with no existing theoretical explanation. We address this gap in knowledge by establishing and empirically validating a generalization scaling law with respect to , and propose that the substantial scaling discrepancy between the two considered domains may be at least partially attributed to the higher intrinsic ``label sharpness'' ()…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning
MethodsSparse Evolutionary Training
