PRISM: Streaming Human Motion Generation with Per-Joint Latent Decomposition

Zeyu Ling; Qing Shuai; Teng Zhang; Shiyang Li; Bo Han; Changqing Zou

arXiv:2603.08590·cs.CV·March 11, 2026

PRISM: Streaming Human Motion Generation with Per-Joint Latent Decomposition

Zeyu Ling, Qing Shuai, Teng Zhang, Shiyang Li, Bo Han, Changqing Zou

PDF

Open Access 1 Models

TL;DR

PRISM introduces a structured joint-factorized latent space and noise-free condition injection, enabling high-quality, versatile, and streaming human motion generation from text and pose inputs, with improved long-horizon synthesis.

Contribution

It proposes a novel joint-factorized latent space and a noise-free conditioning method, unifying multiple motion generation tasks in a single model.

Findings

01

Achieves state-of-the-art results on multiple benchmarks.

02

Enables seamless text-to-motion and pose-conditioned generation.

03

Supports autoregressive streaming synthesis with reduced drift.

Abstract

Text-to-motion generation has advanced rapidly, yet two challenges persist. First, existing motion autoencoders compress each frame into a single monolithic latent vector, entangling trajectory and per-joint rotations in an unstructured representation that downstream generators struggle to model faithfully. Second, text-to-motion, pose-conditioned generation, and long-horizon sequential synthesis typically require separate models or task-specific mechanisms, with autoregressive approaches suffering from severe error accumulation over extended rollouts. We present PRISM, addressing each challenge with a dedicated contribution. (1) A joint-factorized motion latent space: each body joint occupies its own token, forming a structured 2D grid (time joints) compressed by a causal VAE with forward-kinematics supervision. This simple change to the latent space -- without modifying the…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

🤗
ZeyuLing/PRISM-TP2M-1.4B
model· 41 dl· ♡ 2
41 dl♡ 2

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Motion and Animation · Generative Adversarial Networks and Image Synthesis · Human Pose and Action Recognition