The Effect of Intrinsic Dataset Properties on Generalization: Unraveling   Learning Differences Between Natural and Medical Images

Nicholas Konz; Maciej A. Mazurowski

arXiv:2401.08865·cs.CV·February 22, 2024·1 cites

The Effect of Intrinsic Dataset Properties on Generalization: Unraveling Learning Differences Between Natural and Medical Images

Nicholas Konz, Maciej A. Mazurowski

PDF

Open Access 1 Repo

TL;DR

This paper explores how intrinsic properties of datasets, like intrinsic dimension and label sharpness, affect neural network generalization, robustness, and learning differences between natural and medical images, supported by theoretical and empirical analysis.

Contribution

It introduces a generalization scaling law based on intrinsic dataset properties and links label sharpness to adversarial vulnerability, extending the analysis to learned representations.

Findings

01

Generalization error scales with intrinsic data dimension.

02

Medical images exhibit higher label sharpness, affecting learning.

03

Higher label sharpness correlates with increased adversarial vulnerability.

Abstract

This paper investigates discrepancies in how neural networks learn from different imaging domains, which are commonly overlooked when adopting computer vision techniques from the domain of natural images to other specialized domains such as medical images. Recent works have found that the generalization error of a trained network typically increases with the intrinsic dimension ( $d_{d a t a}$ ) of its training set. Yet, the steepness of this relationship varies significantly between medical (radiological) and natural imaging domains, with no existing theoretical explanation. We address this gap in knowledge by establishing and empirically validating a generalization scaling law with respect to $d_{d a t a}$ , and propose that the substantial scaling discrepancy between the two considered domains may be at least partially attributed to the higher intrinsic ``label sharpness'' ( $K_{F}$ )…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

mazurowski-lab/intrinsic-properties
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsAdversarial Robustness in Machine Learning · COVID-19 diagnosis using AI · Domain Adaptation and Few-Shot Learning

MethodsSparse Evolutionary Training