A Unified Compression Framework for Efficient Speech-Driven Talking-Face   Generation

Bo-Kyeong Kim; Jaemin Kang; Daeun Seo; Hancheol Park; Shinkook Choi,; Hyoung-Kyu Song; Hyungshin Kim; Sungsu Lim

arXiv:2304.00471·cs.SD·May 1, 2023·1 cites

A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation

Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi,, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

PDF

Open Access

TL;DR

This paper introduces a lightweight, efficient speech-driven talking-face generation model that reduces computational costs significantly while maintaining high-quality output through knowledge distillation and mixed precision quantization.

Contribution

The study proposes a compact generator architecture with a novel training scheme and mixed precision quantization to enhance efficiency without sacrificing quality.

Findings

01

28× reduction in parameters and MACs

02

Up to 19× speedup on edge GPUs

03

Performance comparable to original models

Abstract

Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limit their efficient deployment. This study aims to develop a lightweight model for speech-driven talking-face synthesis. We build a compact generator by removing the residual blocks and reducing the channel width from Wav2Lip, a popular talking-face generator. We also present a knowledge distillation scheme to stably yet effectively train the small-capacity generator without adversarial learning. We reduce the number of parameters and MACs by 28 $\times$ while retaining the performance of the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing

MethodsKnowledge Distillation