Seg2Any: Open-set Segmentation-Mask-to-Image Generation with Precise Shape and Semantic Control
Danfeng Li, Hui Zhang, Sheng Wang, Jiacheng Li, Zuxuan Wu

TL;DR
Seg2Any introduces a novel segmentation-mask-to-image generation framework that achieves precise spatial layout, semantic, and shape control by decoupling conditions and employing advanced multimodal diffusion transformers, supported by a large-scale dataset.
Contribution
The paper presents Seg2Any, a new S2I framework with decoupled semantic and shape conditions, attribute isolation, and a large dataset for open-set generation, advancing spatial and attribute control in image synthesis.
Findings
State-of-the-art performance on S2I benchmarks.
Effective semantic and shape consistency in generated images.
Robust attribute control in multi-entity scenarios.
Abstract
Despite recent advances in diffusion models, top-tier text-to-image (T2I) models still struggle to achieve precise spatial layout control, i.e. accurately generating entities with specified attributes and locations. Segmentation-mask-to-image (S2I) generation has emerged as a promising solution by incorporating pixel-level spatial guidance and regional text prompts. However, existing S2I methods fail to simultaneously ensure semantic consistency and shape consistency. To address these challenges, we propose Seg2Any, a novel S2I framework built upon advanced multimodal diffusion transformers (e.g. FLUX). First, to achieve both semantic and shape consistency, we decouple segmentation mask conditions into regional semantic and high-frequency shape components. The regional semantic condition is introduced by a Semantic Alignment Attention Mask, ensuring that generated entities adhere to…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsMedical Image Segmentation Techniques · Advanced Neural Network Applications · 3D Surveying and Cultural Heritage
