SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Dingcheng Zhen; Xu Zheng; Ruixin Zhang; Zhiqi Jiang; Yichao Yan; Ming Tao; Shunshun Yin

arXiv:2603.11746·cs.CV·March 20, 2026

SoulX-LiveAct: Towards Hour-Scale Real-Time Human Animation with Neighbor Forcing and ConvKV Memory

Dingcheng Zhen, Xu Zheng, Ruixin Zhang, Zhiqi Jiang, Yichao Yan, Ming Tao, Shunshun Yin

PDF

Open Access 1 Models

TL;DR

This paper introduces SoulX-LiveAct, a novel AR diffusion framework for hour-scale real-time human animation that improves training stability, inference efficiency, and animation quality, enabling 20 FPS streaming on minimal hardware.

Contribution

It proposes Neighbor Forcing for stable, diffusion-step-consistent propagation and ConvKV memory for constant-memory inference, advancing hour-scale real-time human animation.

Findings

01

Supports 20 FPS real-time streaming on 2 GPUs.

02

Achieves state-of-the-art lip-sync and animation quality.

03

Significantly improves training convergence and inference efficiency.

Abstract

Autoregressive (AR) diffusion models offer a promising framework for sequential generation tasks such as video synthesis by combining diffusion modeling with causal inference. Although they support streaming generation, existing AR diffusion methods struggle to scale efficiently. In this paper, we identify two key challenges in hour-scale real-time human animation. First, most forcing strategies propagate sample-level representations with mismatched diffusion states, causing inconsistent learning signals and unstable convergence. Second, historical representations grow unbounded and lack structure, preventing effective reuse of cached states and severely limiting inference efficiency. To address these challenges, we propose Neighbor Forcing, a diffusion-step-consistent AR formulation that propagates temporally adjacent frames as latent neighbors under the same noise condition. This…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
Soul-AILab/LiveAct
model· 1.1k dl· ♡ 12
1.1k dl♡ 12

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Face recognition and analysis