FlashPortrait: 6x Faster Infinite Portrait Animation with Adaptive Latent Prediction
Shuyuan Tu, Yueming Pan, Yinming Huang, Xintong Han, Zhen Xing, Qi Dai, Kai Qiu, Chong Luo, Zuxuan Wu

TL;DR
FlashPortrait is a novel video diffusion transformer that produces ID-preserving, infinite-length portrait animations with up to 6x faster inference by adaptive latent prediction and a sliding-window scheme for smooth long-term consistency.
Contribution
It introduces a new method combining normalized facial expression features and higher-order latent derivatives for accelerated, ID-preserving long portrait animation.
Findings
Achieves up to 6x inference speedup.
Maintains high ID consistency in long videos.
Outperforms existing methods on benchmark datasets.
Abstract
Current diffusion-based acceleration methods for long-portrait animation struggle to ensure identity (ID) consistency. This paper presents FlashPortrait, an end-to-end video diffusion transformer capable of synthesizing ID-preserving, infinite-length videos while achieving up to 6x acceleration in inference speed. In particular, FlashPortrait begins by computing the identity-agnostic facial expression features with an off-the-shelf extractor. It then introduces a Normalized Facial Expression Block to align facial features with diffusion latents by normalizing them with their respective means and variances, thereby improving identity stability in facial modeling. During inference, FlashPortrait adopts a dynamic sliding-window scheme with weighted blending in overlapping areas, ensuring smooth transitions and ID consistency in long animations. In each context window, based on the latent…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Face recognition and analysis · Emotion and Mood Recognition
