SynthSeg-Agents: Multi-Agent Synthetic Data Generation for Zero-Shot Weakly Supervised Semantic Segmentation
Wangyu Wu, Zhenhong Chen, Xiaowei Huang, Fei Ma, Jimin Xiao

TL;DR
This paper introduces SynthSeg Agents, a multi-agent framework driven by Large Language Models to generate synthetic training data for zero-shot weakly supervised semantic segmentation, eliminating the need for real images.
Contribution
It proposes a novel multi-agent system utilizing LLMs and vision-language models to generate high-quality synthetic data for semantic segmentation without real image supervision.
Findings
Achieves competitive segmentation performance on PASCAL VOC 2012 and COCO 2014 datasets.
Demonstrates the effectiveness of LLM-driven synthetic data generation in zero-shot settings.
Reduces reliance on real annotated images for training semantic segmentation models.
Abstract
Weakly Supervised Semantic Segmentation (WSSS) with image level labels aims to produce pixel level predictions without requiring dense annotations. While recent approaches have leveraged generative models to augment existing data, they remain dependent on real world training samples. In this paper, we introduce a novel direction, Zero Shot Weakly Supervised Semantic Segmentation (ZSWSSS), and propose SynthSeg Agents, a multi agent framework driven by Large Language Models (LLMs) to generate synthetic training data entirely without real images. SynthSeg Agents comprises two key modules, a Self Refine Prompt Agent and an Image Generation Agent. The Self Refine Prompt Agent autonomously crafts diverse and semantically rich image prompts via iterative refinement, memory mechanisms, and prompt space exploration, guided by CLIP based similarity and nearest neighbor diversity filtering. These…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Multimodal Machine Learning Applications · Domain Adaptation and Few-Shot Learning
