TL;DR
This paper introduces the STC framework that progressively trains deep neural networks for semantic segmentation using only image-level labels, starting from simple images and moving to complex ones, reducing annotation costs.
Contribution
The novel STC framework leverages a step-wise approach with saliency maps and image-level annotations to train segmentation networks without pixel-level supervision.
Findings
Outperforms state-of-the-art methods on PASCAL VOC 2012.
Effectively utilizes only image-level annotations for training.
Achieves competitive segmentation accuracy with reduced annotation effort.
Abstract
Recently, significant improvement has been made on semantic object segmentation due to the development of deep convolutional neural networks (DCNNs). Training such a DCNN usually relies on a large number of images with pixel-level segmentation masks, and annotating these images is very costly in terms of both finance and human effort. In this paper, we propose a simple to complex (STC) framework in which only image-level annotations are utilized to learn DCNNs for semantic segmentation. Specifically, we first train an initial segmentation network called Initial-DCNN with the saliency maps of simple images (i.e., those with a single category of major object(s) and clean background). These saliency maps can be automatically obtained by existing bottom-up salient object detection techniques, where no supervision information is needed. Then, a better network called Enhanced-DCNN is learned…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
