CNN: Single-label to Multi-label
Yunchao Wei, Wei Xia, Junshi Huang, Bingbing Ni, Jian Dong, Yao Zhao,, Shuicheng Yan

TL;DR
This paper introduces Hypotheses-CNN-Pooling (HCP), a flexible deep learning framework that effectively handles multi-label image classification without requiring bounding box annotations, outperforming existing methods on Pascal VOC datasets.
Contribution
The paper proposes a novel CNN-based infrastructure that aggregates object hypotheses for multi-label classification without needing explicit bounding box labels or hypothesis annotations.
Findings
Achieves 84.2% mAP on VOC2012 with HCP alone.
Outperforms state-of-the-art methods by over 7% mAP.
Demonstrates robustness to noisy hypotheses and no need for explicit hypothesis labels.
Abstract
Convolutional Neural Network (CNN) has demonstrated promising performance in single-label image classification tasks. However, how CNN best copes with multi-label images still remains an open problem, mainly due to the complex underlying object layouts and insufficient multi-label training images. In this work, we propose a flexible deep CNN infrastructure, called Hypotheses-CNN-Pooling (HCP), where an arbitrary number of object segment hypotheses are taken as the inputs, then a shared CNN is connected with each hypothesis, and finally the CNN output results from different hypotheses are aggregated with max pooling to produce the ultimate multi-label predictions. Some unique characteristics of this flexible deep CNN infrastructure include: 1) no ground truth bounding box information is required for training; 2) the whole HCP infrastructure is robust to possibly noisy and/or redundant…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsMax Pooling
