MoZoo:Unleashing Video Diffusion power in animal fur and muscle simulation
Dongxia Liu, Jie Ma, Xiaochen Yang, Jiancheng Zhang, Bin Xia, Zhehan Kan, Nisha Huang, Jun Liang, Wenming Yang, Jin Li

TL;DR
MoZoo introduces a novel generative dynamics solver for high-fidelity animal fur and muscle simulation, leveraging multimodal guidance, role-aware synchronization, and synthetic data to improve realism and efficiency.
Contribution
The paper presents MoZoo, a new approach combining role-aware synchronization, asymmetric attention, and synthetic data pipelines for realistic animal simulation.
Findings
MoZoo achieves high-fidelity fur and muscle simulation across various animals.
The method maintains superior temporal and structural consistency.
MoZoo outperforms existing techniques in realism and computational efficiency.
Abstract
The creation of cinematic-quality animal effects necessitates the precise modeling of muscle and fur dynamics, a process that remains both labor-intensive and computationally expensive within traditional production workflows. While generative diffusion models have shown promise in diverse artistic workflows, their capacity for high-fidelity animal simulation remains largely unexploited. We present MoZoo, a generative dynamics solver that bypasses conventional refinement to synthesize high-fidelity animal videos from coarse meshes under multimodal guidance. We propose Role-Aware RoPE (RAR-RoPE) which employs role-based index remapping to synchronize motion alignment while decoupling reference information via fixed temporal offsets. Complementing this, Asymmetric Decoupled Attention partitions the latent sequence to enforce a unidirectional information flow, effectively preventing feature…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
