LPM 1.0: Video-based Character Performance Model

Ailing Zeng; Casper Yang; Chauncey Ge; Eddie Zhang; Garvey Xu; Gavin Lin; Gilbert Gu; Jeremy Pi; Leo Li; Mingyi Shi; Shawn Wang; Sheng Bi; Steven Tang; Thorn Hang; Tobey Guo; Vincent Li; Xin Tong; Yikang Li; Yuchen Sun; Yue Zhao; Yuhan Lu; Yuwei Li; Zane Zhang; Zeshi Yang; Zi Ye

arXiv:2604.07823·cs.CV·April 16, 2026

LPM 1.0: Video-based Character Performance Model

Ailing Zeng, Casper Yang, Chauncey Ge, Eddie Zhang, Garvey Xu, Gavin Lin, Gilbert Gu, Jeremy Pi, Leo Li, Mingyi Shi, Shawn Wang, Sheng Bi, Steven Tang, Thorn Hang, Tobey Guo, Vincent Li, Xin Tong, Yikang Li, Yuchen Sun, Yue Zhao, Yuhan Lu, Yuwei Li, Zane Zhang, Zeshi Yang, Zi Ye

PDF

1 Repo

TL;DR

LPM 1.0 is a multimodal video model that generates real-time, identity-stable character performances for conversational scenarios, advancing the realism and controllability of virtual characters.

Contribution

It introduces a large-scale dataset, a 17B-parameter diffusion transformer, and a causal streaming generator for high-quality, real-time character performance in conversations.

Findings

01

Achieves state-of-the-art results in performance quality and stability.

02

Enables real-time, infinite-length character interactions.

03

Provides a new benchmark for interactive character performance.

Abstract

Performance, the externalization of intent, emotion, and personality through visual, vocal, and temporal behavior, is what makes a character alive. Learning such performance from video is a promising alternative to traditional 3D pipelines. However, existing video models struggle to jointly achieve high expressiveness, real-time inference, and long-horizon identity stability, a tension we call the performance trilemma. Conversation is the most comprehensive performance scenario, as characters simultaneously speak, listen, react, and emote while maintaining identity over time. To address this, we present LPM 1.0 (Large Performance Model), focusing on single-person full-duplex audio-visual conversational performance. Concretely, we build a multimodal human-centric dataset through strict filtering, speaking-listening audio-video pairing, performance understanding, and identity-aware…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

large-performance-model/large-performance-model.github.io
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.