Analyzing the Feature Extractor Networks for Face Image Synthesis
Erdi Sar{\i}ta\c{s}, Haz{\i}m Kemal Ekenel

TL;DR
This paper evaluates various feature extractors like InceptionV3, CLIP, DINOv2, and ArcFace for assessing the realism of face image synthesis, highlighting their behaviors and limitations across different metrics and datasets.
Contribution
It provides a comprehensive analysis of multiple feature extractors and their effectiveness in evaluating face image synthesis, addressing limitations of traditional methods like InceptionV3.
Findings
InceptionV3 shows limitations for face image evaluation.
Different feature extractors exhibit distinct behaviors in metrics.
Deep analysis reveals the impact of feature normalization and attention.
Abstract
Advancements like Generative Adversarial Networks have attracted the attention of researchers toward face image synthesis to generate ever more realistic images. Thereby, the need for the evaluation criteria to assess the realism of the generated images has become apparent. While FID utilized with InceptionV3 is one of the primary choices for benchmarking, concerns about InceptionV3's limitations for face images have emerged. This study investigates the behavior of diverse feature extractors -- InceptionV3, CLIP, DINOv2, and ArcFace -- considering a variety of metrics -- FID, KID, Precision\&Recall. While the FFHQ dataset is used as the target domain, as the source domains, the CelebA-HQ dataset and the synthetic datasets generated using StyleGAN2 and Projected FastGAN are used. Experiments include deep-down analysis of the features: normalization, model attention during…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Face and Expression Recognition
MethodsWeight Demodulation · HuMan(Expedia)||How do I get a human at Expedia? · R1 Regularization · Path Length Regularization · Convolution · Additive Angular Margin Loss · Contrastive Language-Image Pre-training
