Stylized Text-to-Motion Generation via Hypernetwork-Driven Low-Rank Adaptation
Junhyuk Jeon, Seokhyeon Hong, Junyong Noh

TL;DR
This paper introduces a lightweight, hypernetwork-driven approach for stylized text-to-motion generation that enhances style diversity, generalization, and efficiency without extensive fine-tuning.
Contribution
It proposes a novel framework using hypernetworks and LoRA parameters to dynamically modulate pretrained diffusion models for better style control and generalization.
Findings
Achieves state-of-the-art stylization results on HumanML3D and 100STYLE datasets.
Improves generalization to unseen styles compared to existing methods.
Supports optimization-based guidance without predefined style categories.
Abstract
Text-driven motion diffusion models are capable of generating realistic human motions, but text alone often struggles to express fine-level nuances of motion, commonly referred to as style. Recent approaches have tackled this challenge by attaching a style injection mechanism to a pretrained text-driven diffusion model. Existing stylization methods, however, either require style-specific fine-tuning of existing models or rely on heavy ControlNet-based architectures, limiting efficiency and generalization to unseen styles. We propose a lightweight style conditioning framework that dynamically modulates a pretrained diffusion model through hypernetwork-generated LoRA parameters. A style reference motion is encoded into a global style embedding, which is mapped by a hypernetwork to low-rank updates applied at each denoising step of the diffusion model. By structuring the style latent space…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
