What comprises a good talking-head video generation?: A Survey and   Benchmark

Lele Chen; Guofeng Cui; Ziyi Kou; Haitian Zheng; Chenliang Xu

arXiv:2005.03201·cs.CV·May 8, 2020·30 cites

What comprises a good talking-head video generation?: A Survey and Benchmark

Lele Chen, Guofeng Cui, Ziyi Kou, Haitian Zheng, Chenliang Xu

PDF

Open Access 1 Repo

TL;DR

This paper provides a comprehensive benchmark and evaluation framework for talking-head video generation, addressing limitations of subjective human assessments by proposing standardized metrics and analyzing state-of-the-art methods.

Contribution

It introduces a standardized benchmark with new and selected metrics for evaluating talking-head videos, enabling objective comparison of different approaches.

Findings

01

Identifies key properties for good talking-head videos: identity preservation, lip sync, quality, natural motion.

02

Analyzes strengths and weaknesses of current state-of-the-art methods.

03

Provides a reproducible evaluation code for future research.

Abstract

Over the years, performance evaluation has become essential in computer vision, enabling tangible progress in many sub-fields. While talking-head video generation has become an emerging research topic, existing evaluations on this topic present many limitations. For example, most approaches use human subjects (e.g., via Amazon MTurk) to evaluate their research claims directly. This subjective evaluation is cumbersome, unreproducible, and may impend the evolution of new research. In this work, we present a carefully-designed benchmark for evaluating talking-head video generation with standardized dataset pre-processing strategies. As for evaluation, we either propose new metrics or select the most appropriate ones to evaluate results in what we consider as desired properties for a good talking-head video, namely, identity preserving, lip synchronization, high video quality, and…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

lelechen63/talking-head-generation-survey
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Video Analysis and Summarization