TL;DR
This paper introduces Asymmetric Duos, a cost-effective method where a smaller, less accurate model enhances the uncertainty estimation and decision-making of a large model by simple weighted averaging, improving performance with minimal extra computation.
Contribution
The paper presents a novel asymmetric ensemble approach using a small sidekick model to boost large model uncertainty quantification and accuracy efficiently.
Findings
Significant improvements in accuracy and uncertainty metrics across benchmarks.
Small sidekick models rarely harm the large model's performance.
Method requires only 10-20% additional computation.
Abstract
The go-to strategy to apply deep networks in settings where uncertainty informs decisions--ensembling multiple training runs with random initializations--is ill-suited for the extremely large-scale models and practical fine-tuning workflows of today. We introduce a new cost-effective strategy for improving the uncertainty quantification and downstream decisions of a large model (e.g. a fine-tuned ViT-B): coupling it with a less accurate but much smaller "sidekick" (e.g. a fine-tuned ResNet-34) with a fraction of the computational cost. We propose aggregating the predictions of this Asymmetric Duo by simple learned weighted averaging. Surprisingly, despite their inherent asymmetry, the sidekick model almost never harms the performance of the larger model. In fact, across five image classification benchmarks and a variety of model architectures and training schemes (including soups),…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
