TL;DR
This paper investigates how self-supervised learning pretraining duration affects model confidence and abstention capabilities in medical image screening, emphasizing reliability over mere accuracy.
Contribution
It provides a comprehensive analysis of SSL pretraining length on confidence calibration and abstention, highlighting its impact on model reliability in safety-critical tasks.
Findings
SSL pretraining improves selective prediction over training from scratch.
Longer pretraining does not always enhance reliability once accuracy saturates.
Abstention-aware evaluation is crucial for assessing model safety in medical screening.
Abstract
Self-supervised learning (SSL) is now a standard way to pretrain medical image models, but performance is still mostly judged by downstream accuracy. For safety-critical screening tasks such as diabetic retinopathy grading, this is not enough: a model must also know when its predictions are unreliable and defer uncertain cases for clinical review. In this work, we examine how the length of SSL pretraining influences calibrated confidence and confidence-based abstention. We evaluate multiple SSL checkpoints under a fixed fine-tuning protocol and assess calibrated confidence, coverage, selective accuracy, and selective macro-F1. Across datasets and data regimes, SSL pretraining improves selective prediction compared to training from scratch. Unlike prior SSL studies that primarily evaluate downstream accuracy or AUROC, we analyze how SSL pretraining duration influences confidence behavior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
