MS-CustomNet: Controllable Multi-Subject Customization with Hierarchical Relational Semantics
Pengxiang Cai, Mengyang Li

TL;DR
MS-CustomNet introduces a hierarchical, controllable framework for multi-subject text-to-image generation, enabling explicit user-defined arrangements and preserving individual subject identities in complex scenes.
Contribution
The paper presents MS-CustomNet, a novel zero-shot multi-subject customization method with hierarchical spatial control and a new MSI dataset for training complex compositions.
Findings
Achieves a DINO-I score of 0.61 for identity preservation.
Attains a YOLO-L score of 0.94 for spatial control.
Demonstrates superior multi-subject image generation quality.
Abstract
Diffusion-based text-to-image generation has advanced significantly, yet customizing scenes with multiple distinct subjects while maintaining fine-grained control over their interactions remains challenging. Existing methods often struggle to provide explicit user-defined control over the compositional structure and precise spatial relationships between subjects. To address this, we introduce MS-CustomNet, a novel framework for multi-subject customization. MS-CustomNet allows zero-shot integration of multiple user-provided objects and, crucially, empowers users to explicitly define these hierarchical arrangements and spatial placements within the generated image. Our approach ensures individual subject identity preservation while learning and enacting these user-specified inter-subject compositions. We also present the MSI dataset, derived from COCO, to facilitate training on such…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · 3D Shape Modeling and Analysis · Multimodal Machine Learning Applications
