Learning Online Scale Transformation for Talking Head Video Generation
Fa-Ting Hong, Dan Xu

TL;DR
This paper introduces an online scale transformation module for talking head video generation that automatically adjusts the scale of the driving face to match the source, improving face reenactment accuracy.
Contribution
It proposes a novel scale transformation module integrated into the generation process, enabling automatic scale adjustment without anchor frames.
Findings
Accurately adjusts face scale in reenactment
Produces high-quality, correctly scaled face videos
Outperforms existing methods in scale consistency
Abstract
One-shot talking head video generation uses a source image and driving video to create a synthetic video where the source person's facial movements imitate those of the driving video. However, differences in scale between the source and driving images remain a challenge for face reenactment. Existing methods attempt to locate a frame in the driving video that aligns best with the source image, but imprecise alignment can result in suboptimal outcomes. To this end, we introduce a scale transformation module that can automatically adjust the scale of the driving image to fit that of the source image, by using the information of scale difference maintained in the detected keypoints of the source image and the driving frame. Furthermore, to keep perceiving the scale information of faces during the generation process, we incorporate the scale information learned from the scale…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsEducation and Learning Interventions · Video Analysis and Summarization · Human Motion and Animation
