Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers

Xin Ma; Yaohui Wang; Genyun Jia; Xinyuan Chen; Tien-Tsin Wong; Cunjian Chen

arXiv:2508.07246·cs.CV·August 12, 2025

Consistent and Controllable Image Animation with Motion Linear Diffusion Transformers

Xin Ma, Yaohui Wang, Genyun Jia, Xinyuan Chen, Tien-Tsin Wong, Cunjian Chen

PDF

Open Access

TL;DR

MiraMo is a novel image animation framework that improves consistency, smoothness, and controllability by integrating efficient attention, motion residual learning, and DCT-based noise refinement, outperforming existing methods.

Contribution

The paper introduces MiraMo, a new approach combining linear attention, motion residual learning, and DCT-based noise refinement to enhance efficiency and quality in image animation.

Findings

01

Outperforms state-of-the-art methods in animation quality and consistency

02

Achieves faster inference with reduced computational complexity

03

Demonstrates versatility in motion transfer and video editing applications

Abstract

Image animation has seen significant progress, driven by the powerful generative capabilities of diffusion models. However, maintaining appearance consistency with static input images and mitigating abrupt motion transitions in generated animations remain persistent challenges. While text-to-video (T2V) generation has demonstrated impressive performance with diffusion transformer models, the image animation field still largely relies on U-Net-based diffusion models, which lag behind the latest T2V approaches. Moreover, the quadratic complexity of vanilla self-attention mechanisms in Transformers imposes heavy computational demands, making image animation particularly resource-intensive. To address these issues, we propose MiraMo, a framework designed to enhance efficiency, appearance consistency, and motion smoothness in image animation. Specifically, MiraMo introduces three key…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Computer Graphics and Visualization Techniques