A Unified Compression Framework for Efficient Speech-Driven Talking-Face Generation
Bo-Kyeong Kim, Jaemin Kang, Daeun Seo, Hancheol Park, Shinkook Choi,, Hyoung-Kyu Song, Hyungshin Kim, Sungsu Lim

TL;DR
This paper introduces a lightweight, efficient speech-driven talking-face generation model that reduces computational costs significantly while maintaining high-quality output through knowledge distillation and mixed precision quantization.
Contribution
The study proposes a compact generator architecture with a novel training scheme and mixed precision quantization to enhance efficiency without sacrificing quality.
Findings
28× reduction in parameters and MACs
Up to 19× speedup on edge GPUs
Performance comparable to original models
Abstract
Virtual humans have gained considerable attention in numerous industries, e.g., entertainment and e-commerce. As a core technology, synthesizing photorealistic face frames from target speech and facial identity has been actively studied with generative adversarial networks. Despite remarkable results of modern talking-face generation models, they often entail high computational burdens, which limit their efficient deployment. This study aims to develop a lightweight model for speech-driven talking-face synthesis. We build a compact generator by removing the residual blocks and reducing the channel width from Wav2Lip, a popular talking-face generator. We also present a knowledge distillation scheme to stably yet effectively train the small-capacity generator without adversarial learning. We reduce the number of parameters and MACs by 28 while retaining the performance of the…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
MethodsKnowledge Distillation
