Classifying a specific image region using convolutional nets with an ROI mask as input
Sagi Eppel

TL;DR
This paper introduces a CNN-based method that uses an ROI mask as an attention mechanism to improve classification accuracy for specific image regions, especially small ones, by integrating the mask early in the network.
Contribution
It proposes a novel approach that incorporates an ROI mask as an attention map in CNNs, enhancing classification of targeted regions with background context.
Findings
Superior performance on COCO and OpenSurfaces datasets.
Combining attention at the first layer yields better results.
Method benefits small region classification requiring contextual cues.
Abstract
Convolutional neural nets (CNN) are the leading computer vision method for classifying images. In some cases, it is desirable to classify only a specific region of the image that corresponds to a certain object. Hence, assuming that the region of the object in the image is known in advance and is given as a binary region of interest (ROI) mask, the goal is to classify the object in this region using a convolutional neural net. This goal is achieved using a standard image classification net with the addition of a side branch, which converts the ROI mask into an attention map. This map is then combined with the image classification net. This allows the net to focus the attention on the object region while still extracting contextual cues from the background. This approach was evaluated using the COCO object dataset and the OpenSurfaces materials dataset. In both cases, it gave superior…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsAdvanced Neural Network Applications · Domain Adaptation and Few-Shot Learning · Industrial Vision Systems and Defect Detection
