XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen; Mengyi Zhao; Haomiao Sun; Li Chen; Xu Wang; Kang Du; Xinglong Wu

arXiv:2506.21416·cs.CV·June 27, 2025

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation

Bowen Chen, Mengyi Zhao, Haomiao Sun, Li Chen, Xu Wang, Kang Du, Xinglong Wu

PDF

Open Access 1 Repo 1 Models 1 Video

TL;DR

XVerse introduces a novel method for multi-subject, fine-grained control in text-to-image diffusion models, enabling independent manipulation of subjects and attributes with high fidelity and coherence.

Contribution

It proposes a new multi-subject control technique using token-specific text-stream modulation, enhancing editability and attribute disentanglement in diffusion transformer-based image synthesis.

Findings

01

Enables precise, independent control over multiple subjects.

02

Maintains high image fidelity and coherence.

03

Improves attribute disentanglement and editability.

Abstract

Achieving fine-grained control over subject identity and semantic attributes (pose, style, lighting) in text-to-image generation, particularly for multiple subjects, often undermines the editability and coherence of Diffusion Transformers (DiTs). Many approaches introduce artifacts or suffer from attribute entanglement. To overcome these challenges, we propose a novel multi-subject controlled generation model XVerse. By transforming reference images into offsets for token-specific text-stream modulation, XVerse allows for precise and independent control for specific subject without disrupting image latents or features. Consequently, XVerse offers high-fidelity, editable multi-subject image synthesis with robust control over individual subject characteristics and semantic attributes. This advancement significantly improves personalized and complex scene generation capabilities.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

bytedance/xverse
pytorchOfficial

Models

🤗
ByteDance/XVerse
model· 41 dl· ♡ 89
41 dl♡ 89

Videos

XVerse: Consistent Multi-Subject Control of Identity and Semantic Attributes via DiT Modulation· slideslive

Taxonomy

TopicsCognitive Computing and Networks · Robotics and Automated Systems · Big Data and Digital Economy

MethodsDiffusion