Differentiable Patch Selection for Image Recognition

Jean-Baptiste Cordonnier; Aravindh Mahendran; Alexey Dosovitskiy; Dirk; Weissenborn; Jakob Uszkoreit; Thomas Unterthiner

arXiv:2104.03059·cs.CV·April 8, 2021

Differentiable Patch Selection for Image Recognition

Jean-Baptiste Cordonnier, Aravindh Mahendran, Alexey Dosovitskiy, Dirk, Weissenborn, Jakob Uszkoreit, Thomas Unterthiner

PDF

TL;DR

This paper introduces a differentiable patch selection method using a Top-K operator, enabling efficient high-resolution image processing by focusing on relevant regions, trained end-to-end without bounding box annotations.

Contribution

It presents a novel differentiable patch selection technique that can be integrated with any neural network for efficient high-res image recognition.

Findings

01

Effective in traffic sign recognition

02

Capable of inter-patch relationship reasoning

03

Achieves fine-grained recognition without bounding boxes

Abstract

Neural Networks require large amounts of memory and compute to process high resolution images, even when only a small part of the image is actually informative for the task at hand. We propose a method based on a differentiable Top-K operator to select the most relevant parts of the input to efficiently process high resolution images. Our method may be interfaced with any downstream neural network, is able to aggregate information from different patches in a flexible way, and allows the whole model to be trained end-to-end using backpropagation. We show results for traffic sign recognition, inter-patch relationship reasoning, and fine-grained recognition without using object/part bounding box annotations during training.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.