A Training-free Synthetic Data Selection Method for Semantic   Segmentation

Hao Tang; Siyue Yu; Jian Pang; Bingfeng Zhang

arXiv:2501.15201·cs.CV·January 28, 2025

A Training-free Synthetic Data Selection Method for Semantic Segmentation

Hao Tang, Siyue Yu, Jian Pang, Bingfeng Zhang

PDF

Open Access 1 Repo 1 Video

TL;DR

This paper introduces a training-free method using CLIP to select high-quality synthetic data for semantic segmentation, improving model performance while reducing data size.

Contribution

The proposed Synthetic Data Selection (SDS) strategy effectively filters synthetic data without training, enhancing segmentation accuracy and data efficiency.

Findings

01

Reduces synthetic dataset size by half.

02

Achieves higher segmentation performance with selected data.

03

Demonstrates effectiveness across multiple experiments.

Abstract

Training semantic segmenter with synthetic data has been attracting great attention due to its easy accessibility and huge quantities. Most previous methods focused on producing large-scale synthetic image-annotation samples and then training the segmenter with all of them. However, such a solution remains a main challenge in that the poor-quality samples are unavoidable, and using them to train the model will damage the training process. In this paper, we propose a training-free Synthetic Data Selection (SDS) strategy with CLIP to select high-quality samples for building a reliable synthetic dataset. Specifically, given massive synthetic image-annotation pairs, we first design a Perturbation-based CLIP Similarity (PCS) to measure the reliability of synthetic image, thus removing samples with low-quality images. Then we propose a class-balance Annotation Similarity Filter (ASF) by…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

tanghao2000/sds
pytorchOfficial

Videos

A Training-free Synthetic Data Selection Method for Semantic Segmentation· underline

Taxonomy

TopicsWeb Data Mining and Analysis · Advanced Clustering Algorithms Research · Machine Learning and Data Classification

MethodsSoftmax · Attention Is All You Need · Contrastive Language-Image Pre-training