VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping

Sanoojan Baliah; Yohan Abeysinghe; Rusiru Thushara; Khan Muhammad; Abhinav Dhall; Karthik Nandakumar; Muhammad Haris Khan

arXiv:2602.07835·cs.CV·February 20, 2026

VFace: A Training-Free Approach for Diffusion-Based Video Face Swapping

Sanoojan Baliah, Yohan Abeysinghe, Rusiru Thushara, Khan Muhammad, Abhinav Dhall, Karthik Nandakumar, Muhammad Haris Khan

PDF

Open Access

TL;DR

VFace is a training-free, modular approach for high-quality, temporally consistent face swapping in videos using diffusion models, employing novel attention techniques without additional training.

Contribution

It introduces a training-free, plug-and-play framework with novel attention mechanisms for improved video face swapping quality and temporal coherence.

Findings

01

Significantly improves temporal consistency in video face swapping.

02

Enhances visual fidelity without additional training.

03

Operates seamlessly with existing diffusion-based methods.

Abstract

We present a training-free, plug-and-play method, namely VFace, for high-quality face swapping in videos. It can be seamlessly integrated with image-based face swapping approaches built on diffusion models. First, we introduce a Frequency Spectrum Attention Interpolation technique to facilitate generation and intact key identity characteristics. Second, we achieve Target Structure Guidance via plug-and-play attention injection to better align the structural features from the target frame to the generation. Third, we present a Flow-Guided Attention Temporal Smoothening mechanism that enforces spatiotemporal coherence without modifying the underlying diffusion model to reduce temporal inconsistencies typically encountered in frame-wise generation. Our method requires no additional training or video-specific fine-tuning. Extensive experiments show that our method significantly enhances…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Advanced Image Processing Techniques