All's well that FID's well? Result quality and metric scores in GAN   models for lip-sychronization tasks

Carina Geldhauser; Johan Liljegren; Pontus Nordqvist

arXiv:2212.13810·cs.CV·December 29, 2022

All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks

Carina Geldhauser, Johan Liljegren, Pontus Nordqvist

PDF

Open Access

TL;DR

This paper evaluates the performance of GAN models for lip-synchronization, comparing LipGAN and a new variation L1WGAN-GP on the GRID dataset to assess result quality and metric scores.

Contribution

The study reimplements LipGAN in PyTorch and introduces L1WGAN-GP, a novel variation tailored for lip-synchronization tasks, providing a comparative analysis.

Findings

01

L1WGAN-GP outperforms LipGAN in certain metrics

02

Reimplementation in PyTorch ensures reproducibility

03

Comparison highlights strengths and weaknesses of each model

Abstract

We test the performance of GAN models for lip-synchronization. For this, we reimplement LipGAN in Pytorch, train it on the dataset GRID and compare it to our own variation, L1WGAN-GP, adapted to the LipGAN architecture and also trained on GRID.

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis

MethodsTest · LipGAN