A Training-free Synthetic Data Selection Method for Semantic Segmentation
Hao Tang, Siyue Yu, Jian Pang, Bingfeng Zhang

TL;DR
This paper introduces a training-free method using CLIP to select high-quality synthetic data for semantic segmentation, improving model performance while reducing data size.
Contribution
The proposed Synthetic Data Selection (SDS) strategy effectively filters synthetic data without training, enhancing segmentation accuracy and data efficiency.
Findings
Reduces synthetic dataset size by half.
Achieves higher segmentation performance with selected data.
Demonstrates effectiveness across multiple experiments.
Abstract
Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsWeb Data Mining and Analysis · Advanced Clustering Algorithms Research · Machine Learning and Data Classification
MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training
