CA-IDD: Cross-Attention Guided Identity-Conditional Diffusion for Identity-Consistent Face Swapping
Md Shohel Rana, Tanoy Debnath

TL;DR
CA-IDD is a novel diffusion-based face swapping method that uses multi-modal guidance and hierarchical attention to achieve high identity fidelity and visual realism.
Contribution
It introduces the first diffusion-based face swapping framework with multi-modal guidance and expert supervision for improved identity preservation and semantic coherence.
Findings
Achieves an FID of 11.73, surpassing baselines like FaceShifter and MegaFS.
Provides stable training and robust generalization in face swapping.
Enables fine-grained regional control across pose and expression variations.
Abstract
Face swapping aims to optimize realistic facial image generation by leveraging the identity of a source face onto a target face while preserving pose, expression, and context. However, existing methods, especially GAN-based methods, often struggle to balance identity preservation and visual realism due to limited controllability and mode collapse. In this paper, we introduce CA-IDD (Cross-Attention Guided Identity-Conditional Diffusion), the first diffusion-based face swapping approach that integrates multi-modal guidance comprising gaze, identity, and facial parsing through multi-scale cross-attention. Precomputed identity embeddings are incorporated into the denoising process via hierarchical attention layers, resulting in accurate and consistent identity transfer. To improve semantic coherence and visual quality, we use expert-guided supervision, with facial parsing and…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
