TL;DR
Live Avatar introduces a real-time, streaming avatar generation system using a 14-billion-parameter diffusion model, combining algorithmic innovations and system optimizations for infinite-length, high-quality avatar streaming.
Contribution
The paper presents a novel co-designed framework with a two-stage pipeline and Timestep-forcing Pipeline Parallelism, enabling practical real-time streaming of large diffusion models.
Findings
Achieves 45 FPS with 1.21 s TTFF on 5 GPUs.
Enables stable autoregressive generation exceeding 10,000 seconds.
First to enable real-time streaming of a 14B diffusion model for avatars.
Abstract
Audio-driven avatar interaction demands real-time, streaming, and infinite-length generation -- capabilities fundamentally at odds with the sequential denoising and long-horizon drift of current diffusion models. We present Live Avatar, an algorithm-system co-designed framework that addresses both challenges for a 14-billion-parameter diffusion model. On the algorithm side, a two-stage pipeline distills a pretrained bidirectional model into a causal, few-step streaming one, while a set of complementary long-horizon strategies eliminate identity drift and visual artifacts, enabling stable autoregressive generation exceeding 10000 seconds. On the system side, Timestep-forcing Pipeline Parallelism (TPP) assigns each GPU a fixed denoising timestep, converting the sequential diffusion chain into an asynchronous spatial pipeline that simultaneously boosts throughput and improves temporal…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
