Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement
Yuran Dong, Hang Dai, Mang Ye

TL;DR
This paper introduces EditedID, a novel framework that enhances facial identity consistency in multimodal large models during portrait editing by aligning, disentangling, and selectively entangling features, achieving state-of-the-art results.
Contribution
Proposes a training-free, plug-and-play framework with three key components to improve facial ID preservation in multimodal editing models, addressing cross-source biases and feature contamination.
Findings
Achieves state-of-the-art facial ID consistency
Effective in open-world, multi-person scenarios
Provides a practical, deployable solution
Abstract
Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks. However, a persistent and long-standing limitation is the decline in facial identity (ID) consistency during realistic portrait editing. Due to the human eye's high sensitivity to facial features, such inconsistency significantly hinders the practical deployment of these models. Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP due to Cross-source Distribution Bias and Cross-source Feature Contamination. To address these issues, we propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration. By systematically analyzing diffusion trajectories, sampler behaviors, and attention properties, we introduce three key components: 1) Adaptive mixing strategy that…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Emotion and Mood Recognition
