AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D   Talking Face Generation

Yasheng Sun; Wenqing Chu; Hang Zhou; Kaisiyuan Wang; Hideki Koike

arXiv:2402.16124·cs.CV·February 27, 2024·1 cites

AVI-Talking: Learning Audio-Visual Instructions for Expressive 3D Talking Face Generation

Yasheng Sun, Wenqing Chu, Hang Zhou, Kaisiyuan Wang, Hideki Koike

PDF

Open Access

TL;DR

AVI-Talking introduces a novel system that uses large language models and diffusion networks to generate expressive 3D talking faces aligned with speech, enhancing realism and emotional consistency.

Contribution

The paper presents a two-stage audio-visual instruction approach leveraging LLMs and diffusion models for expressive 3D talking face generation, improving interpretability and flexibility.

Findings

01

Effective synthesis of expressive facial movements

02

Enhanced emotional consistency in generated faces

03

Flexible instruction-based face customization

Abstract

While considerable progress has been made in achieving accurate lip synchronization for 3D speech-driven talking face generation, the task of incorporating expressive facial detail synthesis aligned with the speaker's speaking status remains challenging. Our goal is to directly leverage the inherent style information conveyed by human speech for generating an expressive talking face that aligns with the speaking status. In this paper, we propose AVI-Talking, an Audio-Visual Instruction system for expressive Talking face generation. This system harnesses the robust contextual reasoning and hallucination capability offered by Large Language Models (LLMs) to instruct the realistic synthesis of 3D talking faces. Instead of directly learning facial movements from human speech, our two-stage strategy involves the LLMs first comprehending audio information and generating instructions implying…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Advanced Image and Video Retrieval Techniques