TL;DR
This paper introduces a weakly-supervised semantic segmentation method that learns from image labels alone by leveraging prior knowledge from image recognition networks, resulting in improved segmentation accuracy without external objectness or saliency models.
Contribution
The paper presents a novel end-to-end trainable framework that generates class-specific segmentation masks from image labels using prior network knowledge, eliminating the need for external saliency algorithms.
Findings
Outperforms recent weakly-supervised methods on PASCAL VOC 2012
Achieves competitive results without external objectness or saliency models
Demonstrates effective joint classification and segmentation learning
Abstract
Training a Convolutional Neural Network (CNN) for semantic segmentation typically requires to collect a large amount of accurate pixel-level annotations, a hard and expensive task. In contrast, simple image tags are easier to gather. With this paper we introduce a novel weakly-supervised semantic segmentation model able to learn from image labels, and just image labels. Our model uses the prior knowledge of a network trained for image recognition, employing these image annotations as an attention mechanism to identify semantic regions in the images. We then present a methodology that builds accurate class-specific segmentation masks from these regions, where neither external objectness nor saliency algorithms are required. We describe how to incorporate this mask generation strategy into a fully end-to-end trainable process where the network jointly learns to classify and segment…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
