Weakly Supervised Cascaded Convolutional Networks
Ali Diba, Vivek Sharma, Ali Pazandeh, Hamed Pirsiavash, Luc Van Gool

TL;DR
This paper introduces cascaded convolutional neural network architectures for weakly supervised object detection, classification, and localization, demonstrating improved performance on multiple large-scale datasets without requiring detailed annotations.
Contribution
It proposes novel end-to-end trainable cascaded CNN architectures with two or three stages specifically designed for weak supervision in object detection tasks.
Findings
Improved detection and localization accuracy on PASCAL VOC datasets
Effective weakly supervised learning with end-to-end training
Demonstrated scalability on large-scale datasets like ILSVRC
Abstract
Object detection is a challenging task in visual understanding domain, and even more so if the supervision is to be weak. Recently, few efforts to handle the task without expensive human annotations is established by promising deep neural network. A new architecture of cascaded networks is proposed to learn a convolutional neural network (CNN) under such conditions. We introduce two such architectures, with either two cascade stages or three which are trained in an end-to-end pipeline. The first stage of both architectures extracts best candidate of class specific region proposals by training a fully convolutional network. In the case of the three stage architecture, the middle stage provides object segmentation, using the output of the activation maps of first stage. The final stage of both architectures is a part of a convolutional neural network that performs multiple instance…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
