SFQA: A Comprehensive Perceptual Quality Assessment Dataset for Singing Face Generation

Zhilin Gao; Yunhao Li; Sijing Wu; Yucheng Zhu; Huiyu Duan; Guangtao Zhai

arXiv:2601.20385·cs.MM·January 29, 2026

SFQA: A Comprehensive Perceptual Quality Assessment Dataset for Singing Face Generation

Zhilin Gao, Yunhao Li, Sijing Wu, Yucheng Zhu, Huiyu Duan, Guangtao Zhai

PDF

Open Access

TL;DR

This paper introduces SFQA, a new dataset for evaluating the perceptual quality of singing face generation, addressing the lack of specialized datasets and benchmarking existing quality assessment algorithms.

Contribution

The paper presents SFQA, a comprehensive dataset for singing face quality assessment, created with diverse methods, music styles, and subjective evaluations, filling a critical gap in the domain.

Findings

01

Significant variation in quality among different generation methods.

02

Benchmarking results of existing objective quality assessment algorithms.

03

The dataset enables better evaluation and development of SFG methods.

Abstract

The Talking Face Generation task has enormous potential for various applications in digital humans and agents, etc. Singing, as a common facial movement second only to talking, can be regarded as a universal language across ethnicities and cultures. However, it is often underestimated in the field due to lack of singing face datasets and the domain gap between singing and talking in rhythm and amplitude. More significantly, the quality of Singing Face Generation (SFG) often falls short and is uneven or limited by different applicable scenarios, which prompts us to propose timely and effective quality assessment methods to ensure user experience. To address existing gaps in this domain, this paper introduces a new SFG content quality assessment dataset SFQA, built using 12 representative generation methods. During the construction of the dataset, 100 photographs or portraits, as well as…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing