Better Aggregation in Test-Time Augmentation
Divya Shanmugam, Davis Blalock, Guha Balakrishnan, John Guttag

TL;DR
This paper analyzes the limitations of simple averaging in test-time augmentation for image classification and introduces a learning-based aggregation method that improves accuracy across various models and datasets.
Contribution
It provides experimental insights into when simple averaging fails and proposes a novel learning-based aggregation approach for test-time augmentation.
Findings
Learning-based aggregation outperforms simple averaging.
Test-time augmentation can sometimes reduce overall accuracy.
The method is effective across multiple models and datasets.
Abstract
Test-time augmentation -- the aggregation of predictions across transformed versions of a test input -- is a common practice in image classification. Traditionally, predictions are combined using a simple average. In this paper, we present 1) experimental analyses that shed light on cases in which the simple average is suboptimal and 2) a method to address these shortcomings. A key finding is that even when test-time augmentation produces a net improvement in accuracy, it can change many correct predictions into incorrect predictions. We delve into when and why test-time augmentation changes a prediction from being correct to incorrect and vice versa. Building on these insights, we present a learning-based method for aggregating test-time augmentations. Experiments across a diverse set of models, datasets, and augmentations show that our method delivers consistent improvements over…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
