Learning Online Scale Transformation for Talking Head Video Generation

Fa-Ting Hong; Dan Xu

arXiv:2407.09965·cs.CV·July 16, 2024

Learning Online Scale Transformation for Talking Head Video Generation

Fa-Ting Hong, Dan Xu

PDF

Open Access

TL;DR

This paper introduces an online scale transformation module for talking head video generation that automatically adjusts the scale of the driving face to match the source, improving face reenactment accuracy.

Contribution

It proposes a novel scale transformation module integrated into the generation process, enabling automatic scale adjustment without anchor frames.

Findings

01

Accurately adjusts face scale in reenactment

02

Produces high-quality, correctly scaled face videos

03

Outperforms existing methods in scale consistency

Abstract

One-shot talking head video generation uses a source image and driving video to create a synthetic video where the source person's facial movements imitate those of the driving video. However, differences in scale between the source and driving images remain a challenge for face reenactment. Existing methods attempt to locate a frame in the driving video that aligns best with the source image, but imprecise alignment can result in suboptimal outcomes. To this end, we introduce a scale transformation module that can automatically adjust the scale of the driving image to fit that of the source image, by using the information of scale difference maintained in the detected keypoints of the source image and the driving frame. Furthermore, to keep perceiving the scale information of faces during the generation process, we incorporate the scale information learned from the scale…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsEducation and Learning Interventions · Video Analysis and Summarization · Human Motion and Animation