TL;DR
This paper introduces SODGAN, a novel method that generates high-quality synthetic image-mask pairs for salient object detection, reducing the need for extensive labeled data and achieving state-of-the-art results.
Contribution
SODGAN is the first approach to generate synthetic data for SOD, utilizing a diffusion embedding network, few-shot mask generator, and quality discriminator to improve training with minimal labeled data.
Findings
Synthetic data achieves 98.4% of the performance of real data-trained models.
Outperforms several fully-supervised state-of-the-art methods.
Excels in semi/weakly-supervised settings.
Abstract
Although deep salient object detection (SOD) has achieved remarkable progress, deep SOD models are extremely data-hungry, requiring large-scale pixel-wise annotations to deliver such promising results. In this paper, we propose a novel yet effective method for SOD, coined SODGAN, which can generate infinite high-quality image-mask pairs requiring only a few labeled data, and these synthesized pairs can replace the human-labeled DUTS-TR to train any off-the-shelf SOD model. Its contribution is three-fold. 1) Our proposed diffusion embedding network can address the manifold mismatch and is tractable for the latent code generation, better matching with the ImageNet latent space. 2) For the first time, our proposed few-shot saliency mask generator can synthesize infinite accurate image synchronized saliency masks with a few labeled data. 3) Our proposed quality-aware discriminator can…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
MethodsDiffusion
