SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation

Wangyu Wu; Zhenhong Chen; Xiaowei Huang; Fei Ma; Jimin Xiao

arXiv:2512.15310·cs.CV·December 18, 2025

SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation

Wangyu Wu, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao

PDF

Open Access

TL;DR

This paper introduces SynthSeg Agents, a multi-agent framework driven by Large Language Models to generate synthetic training data for zero-shot weakly supervised semantic segmentation, eliminating the need for real images.

Contribution

It proposes a novel multi-agent system utilizing LLMs and vision-language models to generate high-quality synthetic data for semantic segmentation without real image supervision.

Findings

01

Achieves competitive segmentation performance on PASCAL VOC 2012 and COCO 2014 datasets.

02

Demonstrates the effectiveness of LLM-driven synthetic data generation in zero-shot settings.

03

Reduces reliance on real annotated images for training semantic segmentation models.

Abstract

Weakly Supervised Semantic Segmentation (WSSS) with image level labels aims to produce pixel level predictions without requiring dense annotations. While recent approaches have leveraged generative models to augment existing data, they remain dependent on real world training samples. In this paper, we introduce a novel direction, Zero Shot Weakly Supervised Semantic Segmentation (ZSWSSS), and propose SynthSeg Agents, a multi agent framework driven by Large Language Models (LLMs) to generate synthetic training data entirely without real images. SynthSeg Agents comprises two key modules, a Self Refine Prompt Agent and an Image Generation Agent. The Self Refine Prompt Agent autonomously crafts diverse and semantically rich image prompts via iterative refinement, memory mechanisms, and prompt space exploration, guided by CLIP based similarity and nearest neighbor diversity filtering. These…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning