SFQA: A Comprehensive Perceptual Quality Assessment Dataset for Singing Face Generation
Zhilin Gao, Yunhao Li, Sijing Wu, Yucheng Zhu, Huiyu Duan, Guangtao Zhai

TL;DR
This paper introduces SFQA, a new dataset for evaluating the perceptual quality of singing face generation, addressing the lack of specialized datasets and benchmarking existing quality assessment algorithms.
Contribution
The paper presents SFQA, a comprehensive dataset for singing face quality assessment, created with diverse methods, music styles, and subjective evaluations, filling a critical gap in the domain.
Findings
Significant variation in quality among different generation methods.
Benchmarking results of existing objective quality assessment algorithms.
The dataset enables better evaluation and development of SFG methods.
Abstract
The Talking Face Generation task has enormous potential for various applications in digital humans and agents, etc. Singing, as a common facial movement second only to talking, can be regarded as a universal language across ethnicities and cultures. However, it is often underestimated in the field due to lack of singing face datasets and the domain gap between singing and talking in rhythm and amplitude. More significantly, the quality of Singing Face Generation (SFG) often falls short and is uneven or limited by different applicable scenarios, which prompts us to propose timely and effective quality assessment methods to ensure user experience. To address existing gaps in this domain, this paper introduces a new SFG content quality assessment dataset SFQA, built using 12 representative generation methods. During the construction of the dataset, 100 photographs or portraits, as well as…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing
