SingingHead: A Large-scale 4D Dataset for Singing Head Animation

Sijing Wu; Yunhao Li; Weitian Zhang; Jun Jia; Yucheng Zhu; Yichao Yan,; Guangtao Zhai; Xiaokang Yang

arXiv:2312.04369·cs.CV·July 16, 2024·2 cites

SingingHead: A Large-scale 4D Dataset for Singing Head Animation

Sijing Wu, Yunhao Li, Weitian Zhang, Jun Jia, Yucheng Zhu, Yichao Yan,, Guangtao Zhai, Xiaokang Yang

PDF

Open Access

TL;DR

This paper introduces SingingHead, a large-scale dataset for singing head animation, and proposes UniSinger, a unified framework for 3D and 2D singing face synthesis, advancing audio-driven facial animation research.

Contribution

The paper provides the first large-scale singing head dataset and a unified model for 3D and 2D singing face animation, bridging the gap between singing and talking face synthesis.

Findings

01

SingingHead dataset contains 27 hours of synchronized singing videos from 76 individuals.

02

Benchmarking shows existing methods perform suboptimally on singing tasks.

03

UniSinger achieves competitive results in both 3D and 2D singing face synthesis.

Abstract

Singing, as a common facial movement second only to talking, can be regarded as a universal language across ethnicities and cultures, plays an important role in emotional communication, art, and entertainment. However, it is often overlooked in the field of audio-driven facial animation due to the lack of singing head datasets and the domain gap between singing and talking in rhythm and amplitude. To this end, we collect a high-quality large-scale singing head dataset, SingingHead, which consists of more than 27 hours of synchronized singing video, 3D facial motion, singing audio, and background music from 76 individuals and 8 types of music. Along with the SingingHead dataset, we benchmark existing audio-driven 3D facial animation methods and 2D talking head methods on the singing task. Furthermore, we argue that 3D and 2D facial animation tasks can be solved together, and propose a…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Human Motion and Animation