DVI: Disentangling Semantic and Visual Identity for Training-Free Personalized Generation

Guandong Li; Yijun Ding

arXiv:2512.18964·cs.CV·December 23, 2025

DVI: Disentangling Semantic and Visual Identity for Training-Free Personalized Generation

Guandong Li, Yijun Ding

PDF

Open Access

TL;DR

DVI is a zero-shot framework that disentangles semantic and visual identity to improve personalized image generation, enhancing visual consistency and atmospheric fidelity without training.

Contribution

DVI introduces a novel disentanglement of identity into semantic and visual streams using VAE statistics and a parameter-free modulation, enabling training-free personalization.

Findings

01

Significantly improves visual consistency and atmospheric fidelity.

02

Outperforms state-of-the-art methods in identity preservation.

03

Operates without parameter fine-tuning.

Abstract

Recent tuning-free identity customization methods achieve high facial fidelity but often overlook visual context, such as lighting, skin texture, and environmental tone. This limitation leads to ``Semantic-Visual Dissonance,'' where accurate facial geometry clashes with the input's unique atmosphere, causing an unnatural ``sticker-like'' effect. We propose **DVI (Disentangled Visual-Identity)**, a zero-shot framework that orthogonally disentangles identity into fine-grained semantic and coarse-grained visual streams. Unlike methods relying solely on semantic vectors, DVI exploits the inherent statistical properties of the VAE latent space, utilizing mean and variance as lightweight descriptors for global visual atmosphere. We introduce a **Parameter-Free Feature Modulation** mechanism that adaptively modulates semantic embeddings with these visual statistics, effectively injecting the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · 3D Shape Modeling and Analysis