MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Fei Peng; Junqiang Wu; Yan Li; Tingting Gao; Di Zhang; Huiyuan Fu

arXiv:2508.14440·cs.CV·August 21, 2025

MUSE: Multi-Subject Unified Synthesis via Explicit Layout Semantic Expansion

Fei Peng, Junqiang Wu, Yan Li, Tingting Gao, Di Zhang, Huiyuan Fu

PDF

Open Access 1 Models

TL;DR

MUSE is a unified framework for multi-subject image synthesis that achieves precise spatial control and identity preservation by integrating layout and textual guidance through explicit semantic expansion.

Contribution

It introduces a novel concatenated cross-attention mechanism and a two-stage training strategy for improved layout-controllable multi-subject synthesis.

Findings

01

Superior spatial accuracy over existing methods

02

Enhanced identity consistency in generated images

03

Effective zero-shot multi-subject synthesis

Abstract

Existing text-to-image diffusion models have demonstrated remarkable capabilities in generating high-quality images guided by textual prompts. However, achieving multi-subject compositional synthesis with precise spatial control remains a significant challenge. In this work, we address the task of layout-controllable multi-subject synthesis (LMS), which requires both faithful reconstruction of reference subjects and their accurate placement in specified regions within a unified image. While recent advancements have separately improved layout control and subject synthesis, existing approaches struggle to simultaneously satisfy the dual requirements of spatial precision and identity preservation in this composite task. To bridge this gap, we propose MUSE, a unified synthesis framework that employs concatenated cross-attention (CCA) to seamlessly integrate layout specifications with…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
pf0607/MUSE
model

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSemantic Web and Ontologies · Model-Driven Software Engineering Techniques · Modular Robots and Swarm Intelligence