KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks   Generation

Hoang-Son Vo-Thanh; Quang-Vinh Nguyen; and Soo-Hyung Kim

arXiv:2409.05330·cs.CV·September 10, 2024

KAN-Based Fusion of Dual-Domain for Audio-Driven Facial Landmarks Generation

Hoang-Son Vo-Thanh, Quang-Vinh Nguyen, and Soo-Hyung Kim

PDF

Open Access 1 Repo

TL;DR

This paper introduces the KFusion of Dual-Domain model, which effectively generates facial landmarks from audio by separating emotional and contextual information, improving stability and efficiency in audio-driven talking face synthesis.

Contribution

The paper proposes a novel dual-domain fusion approach using the KAN model to enhance landmark generation from audio for talking face synthesis.

Findings

01

High efficiency compared to recent models

02

Effective separation of emotional and facial context information

03

Improved stability of landmark generation

Abstract

Audio-driven talking face generation is a widely researched topic due to its high applicability. Reconstructing a talking face using audio significantly contributes to fields such as education, healthcare, online conversations, virtual assistants, and virtual reality. Early studies often focused solely on changing the mouth movements, which resulted in outcomes with limited practical applications. Recently, researchers have proposed a new approach of constructing the entire face, including face pose, neck, and shoulders. To achieve this, they need to generate through landmarks. However, creating stable landmarks that align well with the audio is a challenge. In this paper, we propose the KFusion of Dual-Domain model, a robust model that generates landmarks from audio. We separate the audio into two distinct domains to learn emotional information and facial context, then use a fusion…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

sowwnn/KFusion-Dual-Domain-for-Speech-to-Landmarks
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing

Methods+ ( 1 ) ⟷ 805 ⟷ ( 330 ) ⟷ 4056|How do I file a complaint with Expedia? · ALIGN