Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement

Yuran Dong; Hang Dai; Mang Ye

arXiv:2602.18752·cs.CV·February 24, 2026

Optimizing ID Consistency in Multimodal Large Models: Facial Restoration via Alignment, Entanglement, and Disentanglement

Yuran Dong, Hang Dai, Mang Ye

PDF

Open Access

TL;DR

This paper introduces EditedID, a novel framework that enhances facial identity consistency in multimodal large models during portrait editing by aligning, disentangling, and selectively entangling features, achieving state-of-the-art results.

Contribution

Proposes a training-free, plug-and-play framework with three key components to improve facial ID preservation in multimodal editing models, addressing cross-source biases and feature contamination.

Findings

01

Achieves state-of-the-art facial ID consistency

02

Effective in open-world, multi-person scenarios

03

Provides a practical, deployable solution

Abstract

Multimodal editing large models have demonstrated powerful editing capabilities across diverse tasks. However, a persistent and long-standing limitation is the decline in facial identity (ID) consistency during realistic portrait editing. Due to the human eye's high sensitivity to facial features, such inconsistency significantly hinders the practical deployment of these models. Current facial ID preservation methods struggle to achieve consistent restoration of both facial identity and edited element IP due to Cross-source Distribution Bias and Cross-source Feature Contamination. To address these issues, we propose EditedID, an Alignment-Disentanglement-Entanglement framework for robust identity-specific facial restoration. By systematically analyzing diffusion trajectories, sampler behaviors, and attention properties, we introduce three key components: 1) Adaptive mixing strategy that…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Emotion and Mood Recognition