TL;DR
This study empirically evaluates 25 complexity measures to understand their correlation with deep learning generalization in medical image analysis, highlighting PAC-Bayes and path norm measures and the benefits of multi-task learning.
Contribution
It provides the first comprehensive empirical analysis of complexity measures for deep learning generalization in medical imaging, especially breast ultrasound.
Findings
PAC-Bayes flatness and path norm measures best explain generalization.
Multi-task learning improves model generalization.
Empirical correlation between complexity measures and generalization performance.
Abstract
The generalization performance of deep learning models for medical image analysis often decreases on images collected with different devices for data acquisition, device settings, or patient population. A better understanding of the generalization capacity on new images is crucial for clinicians' trustworthiness in deep learning. Although significant research efforts have been recently directed toward establishing generalization bounds and complexity measures, still, there is often a significant discrepancy between the predicted and actual generalization performance. As well, related large empirical studies have been primarily based on validation with general-purpose image datasets. This paper presents an empirical study that investigates the correlation between 25 complexity measures and the generalization abilities of supervised deep learning classifiers for breast ultrasound images.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
