Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Yubo Huang; Hailong Guo; Fangtai Wu; Weiqiang Wang; Shifeng Zhang; Shijie Huang; Qijun Gan; Lin Liu; Sirui Zhao; Enhong Chen; Jiaming Liu; Steven Hoi

arXiv:2512.04677·cs.CV·April 21, 2026

Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length

Yubo Huang, Hailong Guo, Fangtai Wu, Weiqiang Wang, Shifeng Zhang, Shijie Huang, Qijun Gan, Lin Liu, Sirui Zhao, Enhong Chen, Jiaming Liu, Steven Hoi

PDF

2 Repos 1 Models

TL;DR

Live Avatar introduces a real-time, streaming avatar generation system using a 14-billion-parameter diffusion model, combining algorithmic innovations and system optimizations for infinite-length, high-quality avatar streaming.

Contribution

The paper presents a novel co-designed framework with a two-stage pipeline and Timestep-forcing Pipeline Parallelism, enabling practical real-time streaming of large diffusion models.

Findings

01

Achieves 45 FPS with 1.21 s TTFF on 5 GPUs.

02

Enables stable autoregressive generation exceeding 10,000 seconds.

03

First to enable real-time streaming of a 14B diffusion model for avatars.

Abstract

Audio-driven avatar interaction demands real-time, streaming, and infinite-length generation -- capabilities fundamentally at odds with the sequential denoising and long-horizon drift of current diffusion models. We present Live Avatar, an algorithm-system co-designed framework that addresses both challenges for a 14-billion-parameter diffusion model. On the algorithm side, a two-stage pipeline distills a pretrained bidirectional model into a causal, few-step streaming one, while a set of complementary long-horizon strategies eliminate identity drift and visual artifacts, enabling stable autoregressive generation exceeding 10000 seconds. On the system side, Timestep-forcing Pipeline Parallelism (TPP) assigns each GPU a fixed denoising timestep, converting the sequential diffusion chain into an asynchronous spatial pipeline that simultaneously boosts throughput and improves temporal…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Models

🤗
Quark-Vision/Live-Avatar
model· ♡ 242
♡ 242

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.