SkyReels-A1: Expressive Portrait Animation in Video Diffusion Transformers
Di Qiu, Zhengcong Fei, Rui Wang, Jialin Bai, Changqian Yu, Mingyuan, Fan, Guibin Chen, Xiang Wen

TL;DR
SkyReels-A1 is a novel framework that improves portrait video animation by enhancing identity preservation, facial expression accuracy, and temporal stability using advanced diffusion transformers and multi-stage training.
Contribution
It introduces an expression-aware conditioning module and a facial image-text alignment system to address common issues in portrait animation, advancing the state-of-the-art in visual coherence and diversity.
Findings
Enhanced identity retention and facial motion transfer accuracy.
Improved temporal coherence and visual stability in animations.
Versatile application potential in virtual avatars and digital media.
Abstract
We present SkyReels-A1, a simple yet effective framework built upon video diffusion Transformer to facilitate portrait image animation. Existing methodologies still encounter issues, including identity distortion, background instability, and unrealistic facial dynamics, particularly in head-only animation scenarios. Besides, extending to accommodate diverse body proportions usually leads to visual inconsistencies or unnatural articulations. To address these challenges, SkyReels-A1 capitalizes on the strong generative capabilities of video DiT, enhancing facial motion transfer precision, identity retention, and temporal coherence. The system incorporates an expression-aware conditioning module that enables seamless video synthesis driven by expression-guided landmark inputs. Integrating the facial image-text alignment module strengthens the fusion of facial attributes with motion…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
Topics3D Surveying and Cultural Heritage
MethodsAttention Is All You Need · Byte Pair Encoding · Layer Normalization · Residual Connection · Linear Layer · Dense Connections · Multi-Head Attention · Diffusion · Position-Wise Feed-Forward Layer · Adam
