All's well that FID's well? Result quality and metric scores in GAN models for lip-sychronization tasks
Carina Geldhauser, Johan Liljegren, Pontus Nordqvist

TL;DR
This paper evaluates the performance of GAN models for lip-synchronization, comparing LipGAN and a new variation L1WGAN-GP on the GRID dataset to assess result quality and metric scores.
Contribution
The study reimplements LipGAN in PyTorch and introduces L1WGAN-GP, a novel variation tailored for lip-synchronization tasks, providing a comparative analysis.
Findings
L1WGAN-GP outperforms LipGAN in certain metrics
Reimplementation in PyTorch ensures reproducibility
Comparison highlights strengths and weaknesses of each model
Abstract
We test the performance of GAN models for lip-synchronization. For this, we reimplement LipGAN in Pytorch, train it on the dataset GRID and compare it to our own variation, L1WGAN-GP, adapted to the LipGAN architecture and also trained on GRID.
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsSpeech and Audio Processing · Music and Audio Processing · Speech Recognition and Synthesis
MethodsTest · LipGAN
