TL;DR
This paper introduces a novel method for training semantic segmentation models using coarse positive and negative annotations, reducing labeling effort while maintaining high accuracy, especially with limited detailed labels.
Contribution
It proposes a dual CNN approach with complementary label learning to effectively learn from noisy coarse annotations for semantic segmentation.
Findings
Outperforms state-of-the-art methods on Cityscapes and retinal datasets.
Effective with small ratios of coarse annotations.
Demonstrates robustness to noisy labeling in segmentation tasks.
Abstract
Large annotated datasets are vital for training segmentation models, but pixel-level labeling is time-consuming, error-prone, and often requires scarce expert annotators, especially in medical imaging. In contrast, coarse annotations are quicker, cheaper, and easier to produce, even by non-experts. In this paper, we propose to use coarse drawings from both positive (target) and negative (background) classes in the image, even with noisy pixels, to train a convolutional neural network (CNN) for semantic segmentation. We present a method for learning the true segmentation label distributions from purely noisy coarse annotations using two coupled CNNs. The separation of the two CNNs is achieved by high fidelity with the characters of the noisy training annotations. We propose to add a complementary label learning that encourages estimating negative label distribution. To illustrate the…
Peer Reviews
Decision·ICLR 2024 Conference Withdrawn Submission
- The problem this paper is trying to solve is interesting: how to effectively train a good segmentation network with only coarse labels? It has good potential in real-world applications. - By using the estimated confusion matrices, the idea of simultaneously constructing a mapping relationship for the input image to both noisy coarse annotations and true segmentation labels is interesting. - The proposed method models and disentangles the complex mappings from the input images to the noisy coar
- The authors mentioned the use of complementary label learning for estimating the distribution of negative coarse labels. Furthermore, it is claimed in Section 3.1 that the proposed method is also applicable to cases involving only positive or negative coarse labels. Does the model work with only negative coarse labels? I am expecting to see the corresponding experiments/ablation studies to support the claims (only uses negative coarse labels). - One of the basic/main assumptions in this paper
Indeed, performing accurate annotations for large datasets for the purposes of segmentation is extremely time-consuming and unrealistic, hence the paper addresses an important, real-world problem. The paper is also clearly written and does a good job of providing a summary of the state-of-the-art in the literature. I like the basic idea of learning the unobserved true segmentation distribution by using a second CNN simultaneously with the first CNN estimating the correct segmentation. This makes
A major weakness of the paper is evaluation. Datasets such a MNIST are not meaningful for evaluating the performance of an image segmentation algorithm. For the retinal image dataset, the evaluation is quite unrealistic since the challenge with coarse segmentation is mainly the variation in how deep the vessel trees are segmented. For example, some annotators may only annotate the major vessels, others will annotate smaller vessels further down in the vessel tree. This is not taken into account
The authors adapt a method proposed for learning image classification from noisy labels to learning image segmentation from coarse labels. Furthermore, they extend the approach to work with positive and negative class labels. Quantitative results indicate competitive performance of the proposed approach, suggesting good potential for practical use in learning segmentation from coarse labels.
The work lacks clarity in terms of the description of the coarse labels employed for evaluation. Furthermore, the work lacks a discussion in which cases basic assumptions and necessary conditions hold. Consequently, the soundness and practical value of the proposed method remains unclear. In more detail: It remains unclear precisely how the synthetic coarse annotations were generated. The nature of the coarse labels is, however, crucial for the success of the proposed approach. E.g., if coarse
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
