TL;DR
LivingSwap is a novel video face swapping model that leverages reference-guided techniques and keyframe conditioning to achieve high fidelity, temporal consistency, and controllable editing in cinematic sequences.
Contribution
This work introduces LivingSwap, the first reference-guided video face swapping model that combines keyframe conditioning with video guidance for improved realism and stability.
Findings
Achieves state-of-the-art face swapping quality with temporal coherence.
Successfully integrates target identity with source expressions and lighting.
Reduces manual effort in film production workflows.
Abstract
Video face swapping is crucial in film and entertainment production, where achieving high fidelity and temporal consistency over long and complex video sequences remains a significant challenge. Inspired by recent advances in reference-guided image editing, we explore whether rich visual attributes from source videos can be similarly leveraged to enhance both fidelity and temporal coherence in video face swapping. Building on this insight, this work presents LivingSwap, the first video reference guided face swapping model. Our approach employs keyframes as conditioning signals to inject the target identity, enabling flexible and controllable editing. By combining keyframe conditioning with video reference guidance, the model performs temporal stitching to ensure stable identity preservation and high-fidelity reconstruction across long video sequences. To address the scarcity of data for…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
