Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers
Xin Ma, Yaohui Wang, Genyun Jia, Xinyuan Chen, Tien-Tsin Wong, Cunjian Chen

TL;DR
MiraMo is a novel image animation framework that improves consistency, smoothness, and controllability by integrating efficient attention, motion residual learning, and DCT-based noise refinement, outperforming existing methods.
Contribution
The paper introduces MiraMo, a new approach combining linear attention, motion residual learning, and DCT-based noise refinement to enhance efficiency and quality in image animation.
Findings
Outperforms state-of-the-art methods in animation quality and consistency
Achieves faster inference with reduced computational complexity
Demonstrates versatility in motion transfer and video editing applications
Abstract
Image animation has seen significant progress, driven by the powerful generative capabilities of diffusion models. However, maintaining appearance consistency with static input images and mitigating abrupt motion transitions in generated animations remain persistent challenges. While text-to-video (T2V) generation has demonstrated impressive performance with diffusion transformer models, the image animation field still largely relies on U-Net-based diffusion models, which lag behind the latest T2V approaches. Moreover, the quadratic complexity of vanilla self-attention mechanisms in Transformers imposes heavy computational demands, making image animation particularly resource-intensive. To address these issues, we propose MiraMo, a framework designed to enhance efficiency, appearance consistency, and motion smoothness in image animation. Specifically, MiraMo introduces three key…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Videos
No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.
Taxonomy
TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computer Graphics and Visualization Techniques
