JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

Xuyang Cao; Guoxin Wang; Sheng Shi; Jun Zhao; Yang Yao; Jintao Fei; Minyu Gao; Pei Xie

arXiv:2411.09209·cs.CV·April 17, 2026

JoyVASA: Portrait and Animal Image Animation with Diffusion-Based Audio-Driven Facial Dynamics and Head Motion Generation

Xuyang Cao, Guoxin Wang, Sheng Shi, Jun Zhao, Yang Yao, Jintao Fei, Minyu Gao, Pei Xie

PDF

1 Repo 1 Models

TL;DR

JoyVASA introduces a diffusion-based framework for audio-driven facial and animal face animation, decoupling static and dynamic features to enable longer, high-quality, multilingual videos with identity-independent motion generation.

Contribution

It proposes a novel decoupled facial representation and diffusion transformer approach that extends animation capabilities to animals and improves video length and quality.

Findings

01

Effective decoupling of static and dynamic facial features.

02

Multilingual support with diverse datasets.

03

Seamless animation of animal faces alongside human portraits.

Abstract

Audio-driven portrait animation has made significant advances with diffusion-based models, improving video quality and lipsync accuracy. However, the increasing complexity of these models has led to inefficiencies in training and inference, as well as constraints on video length and inter-frame continuity. In this paper, we propose JoyVASA, a diffusion-based method for generating facial dynamics and head motion in audio-driven facial animation. Specifically, in the first stage, we introduce a decoupled facial representation framework that separates dynamic facial expressions from static 3D facial representations. This decoupling allows the system to generate longer videos by combining any static 3D facial representation with dynamic motion sequences. Then, in the second stage, a diffusion transformer is trained to generate motion sequences directly from audio cues, independent of…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

jdh-algo/JoyVASA
github

Models

🤗
jdh-algo/JoyVASA
model· 262 dl· ♡ 39
262 dl♡ 39

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.