How stable are Transferability Metrics evaluations?
Andrea Agostinelli, Michal P\'andy, Jasper Uijlings, Thomas, Mensink, Vittorio Ferrari

TL;DR
This study systematically evaluates the stability of transferability metrics across diverse experimental setups, revealing that small variations can lead to different conclusions, and proposes aggregated evaluation methods for more reliable assessments.
Contribution
It introduces a large-scale systematic analysis of transferability metrics, demonstrating the impact of experimental variations and proposing aggregation techniques for more consistent evaluation.
Findings
LogME outperforms others in semantic segmentation.
NLEEP is best for source architecture selection in image classification.
GBC effectively identifies beneficial source-target task pairs.
Abstract
Transferability metrics is a maturing field with increasing interest, which aims at providing heuristics for selecting the most suitable source models to transfer to a given target dataset, without fine-tuning them all. However, existing works rely on custom experimental setups which differ across papers, leading to inconsistent conclusions about which transferability metrics work best. In this paper we conduct a large-scale study by systematically constructing a broad range of 715k experimental setup variations. We discover that even small variations to an experimental setup lead to different conclusions about the superiority of a transferability metric over another. Then we propose better evaluations by aggregating across many experiments, enabling to reach more stable conclusions. As a result, we reveal the superiority of LogME at selecting good source datasets to transfer from in a…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsDomain Adaptation and Few-Shot Learning · COVID-19 diagnosis using AI · Adversarial Robustness in Machine Learning
