Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Mingtao Guo; Guanyu Xing; Yanci Zhang; Yanli Liu

arXiv:2507.16341·cs.CV·July 23, 2025

Navigating Large-Pose Challenge for High-Fidelity Face Reenactment with Video Diffusion Model

Mingtao Guo, Guanyu Xing, Yanci Zhang, Yanli Liu

PDF

Open Access

TL;DR

This paper introduces FRVD, a novel face reenactment framework that effectively handles large pose variations by combining implicit keypoints, warping correction, and a pretrained video model to produce high-fidelity, temporally coherent videos.

Contribution

The paper proposes a new face reenactment method that leverages implicit keypoints and a warping feature mapper within a pretrained video model to improve performance under large pose changes.

Findings

01

FRVD outperforms existing methods in pose accuracy and identity preservation.

02

The use of a pretrained video model enhances temporal coherence and visual quality.

03

The approach effectively handles extreme pose variations in face reenactment.

Abstract

Face reenactment aims to generate realistic talking head videos by transferring motion from a driving video to a static source image while preserving the source identity. Although existing methods based on either implicit or explicit keypoints have shown promise, they struggle with large pose variations due to warping artifacts or the limitations of coarse facial landmarks. In this paper, we present the Face Reenactment Video Diffusion model (FRVD), a novel framework for high-fidelity face reenactment under large pose changes. Our method first employs a motion extractor to extract implicit facial keypoints from the source and driving images to represent fine-grained motion and to perform motion alignment through a warping module. To address the degradation introduced by warping, we introduce a Warping Feature Mapper (WFM) that maps the warped source image into the motion-aware latent…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Facial Nerve Paralysis Treatment and Research · Image and Video Stabilization