TL;DR
This paper introduces learned image resizers trained jointly with vision models, significantly improving classification accuracy on ImageNet by replacing traditional resizing methods with CNN-based learned resizers.
Contribution
The paper proposes a novel CNN-based learned image resizer that enhances task performance over traditional resizers, and demonstrates its effectiveness across multiple vision tasks and models.
Findings
Learned resizers outperform traditional bilinear/bicubic resizers in classification accuracy.
Joint training of resizer and vision model leads to consistent performance improvements.
Learned resizers are adaptable to different models and tasks, including fine-tuning for other vision applications.
Abstract
For all the ways convolutional neural nets have revolutionized computer vision in recent years, one important aspect has received surprisingly little attention: the effect of image size on the accuracy of tasks being trained for. Typically, to be efficient, the input images are resized to a relatively small spatial resolution (e.g. 224x224), and both training and inference are carried out at this resolution. The actual mechanism for this re-scaling has been an afterthought: Namely, off-the-shelf image resizers such as bilinear and bicubic are commonly used in most machine learning software frameworks. But do these resizers limit the on task performance of the trained networks? The answer is yes. Indeed, we show that the typical linear resizer can be replaced with learned resizers that can substantially improve performance. Importantly, while the classical resizers typically result in…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
