InfiniDreamer: Arbitrarily Long Human Motion Generation via Segment Score Distillation
Wenjie Zhuo, Fan Ma, Hehe Fan

TL;DR
InfiniDreamer is a framework that generates arbitrarily long human motion sequences by assembling sub-motions and refining them with Segment Score Distillation, leveraging a pre-trained short-clip motion prior without additional training.
Contribution
It introduces Segment Score Distillation, a training-free optimization method that refines long motion sequences for coherence using a pre-trained motion prior, enabling long sequence generation.
Findings
Successfully generates coherent, long human motion sequences.
Outperforms existing methods in qualitative and quantitative evaluations.
Maintains global and local motion consistency across extended sequences.
Abstract
We present InfiniDreamer, a novel framework for arbitrarily long human motion generation. InfiniDreamer addresses the limitations of current motion generation methods, which are typically restricted to short sequences due to the lack of long motion training data. To achieve this, we first generate sub-motions corresponding to each textual description and then assemble them into a coarse, extended sequence using randomly initialized transition segments. We then introduce an optimization-based method called Segment Score Distillation (SSD) to refine the entire long motion sequence. SSD is designed to utilize an existing motion prior, which is trained only on short clips, in a training-free manner. Specifically, SSD iteratively refines overlapping short segments sampled from the coarsely extended long motion sequence, progressively aligning them with the pre-trained motion diffusion prior.…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsHuman Pose and Action Recognition · Human Motion and Animation · Advanced Vision and Imaging
MethodsNon Maximum Suppression · 1x1 Convolution · Convolution · SSD · Attentive Walk-Aggregating Graph Neural Network · Diffusion
