TL;DR
This paper introduces a deep learning architecture for weakly supervised object detection that leverages pre-trained convolutional neural networks to simultaneously select regions and classify objects, outperforming previous methods.
Contribution
It proposes a novel end-to-end deep detection network that implicitly learns object detectors from image-level labels, improving weakly supervised detection performance.
Findings
Outperforms existing weakly supervised detection systems on PASCAL VOC
Implicitly learns object detectors from image-level classification
Outperforms standard data augmentation and fine-tuning techniques
Abstract
Weakly supervised learning of object detection is an important problem in image understanding that still does not have a satisfactory solution. In this paper, we address this problem by exploiting the power of deep convolutional neural networks pre-trained on large-scale image-level classification tasks. We propose a weakly supervised deep detection architecture that modifies one such network to operate at the level of image regions, performing simultaneously region selection and classification. Trained as an image classifier, the architecture implicitly learns object detectors that are better than alternative weakly supervised detection systems on the PASCAL VOC data. The model, which is a simple and elegant end-to-end architecture, outperforms standard data augmentation and fine-tuning techniques for the task of image-level classification as well.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
