InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation

Wenjie Zhuo; Fan Ma; Hehe Fan

arXiv:2411.18303·cs.CV·October 27, 2025

InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation

Wenjie Zhuo, Fan Ma, Hehe Fan

PDF

Open Access

TL;DR

InfiniDreamer is a framework that generates arbitrarily long human motion sequences by assembling sub-motions and refining them with Segment Score Distillation, leveraging a pre-trained short-clip motion prior without additional training.

Contribution

It introduces Segment Score Distillation, a training-free optimization method that refines long motion sequences for coherence using a pre-trained motion prior, enabling long sequence generation.

Findings

01

Successfully generates coherent, long human motion sequences.

02

Outperforms existing methods in qualitative and quantitative evaluations.

03

Maintains global and local motion consistency across extended sequences.

Abstract

We present InfiniDreamer, a novel framework for arbitrarily long human motion generation. InfiniDreamer addresses the limitations of current motion generation methods, which are typically restricted to short sequences due to the lack of long motion training data. To achieve this, we first generate sub-motions corresponding to each textual description and then assemble them into a coarse, extended sequence using randomly initialized transition segments. We then introduce an optimization-based method called Segment Score Distillation (SSD) to refine the entire long motion sequence. SSD is designed to utilize an existing motion prior, which is trained only on short clips, in a training-free manner. Specifically, SSD iteratively refines overlapping short segments sampled from the coarsely extended long motion sequence, progressively aligning them with the pre-trained motion diffusion prior.…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging

MethodsNon Maximum Suppression · 1x1 Convolution · Convolution · SSD · Attentive Walk-Aggregating Graph Neural Network · Diffusion