DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

Mingshuang Luo; Shuang Liang; Zhengkun Rong; Yuxuan Luo; Tianshu Hu; Ruibing Hou; Hong Chang; Yong Li; Yuan Zhang; Mingyuan Gao

arXiv:2601.21716·cs.CV·January 30, 2026

DreamActor-M2: Universal Character Image Animation via Spatiotemporal In-Context Learning

Mingshuang Luo, Shuang Liang, Zhengkun Rong, Yuxuan Luo, Tianshu Hu, Ruibing Hou, Hong Chang, Yong Li, Yuan Zhang, Mingyuan Gao

PDF

Open Access

TL;DR

DreamActor-M2 introduces a universal character animation method that leverages in-context learning and a novel data synthesis pipeline to improve motion transfer fidelity and generalization across diverse characters.

Contribution

It redefines motion conditioning as an in-context learning problem and employs a self-bootstrapped data pipeline for enhanced cross-character animation.

Findings

01

Achieves state-of-the-art visual fidelity in character animation.

02

Demonstrates robust generalization across diverse characters and motions.

03

Introduces AW Bench, a comprehensive benchmark for evaluation.

Abstract

Character image animation aims to synthesize high-fidelity videos by transferring motion from a driving sequence to a static reference image. Despite recent advancements, existing methods suffer from two fundamental challenges: (1) suboptimal motion injection strategies that lead to a trade-off between identity preservation and motion consistency, manifesting as a "see-saw", and (2) an over-reliance on explicit pose priors (e.g., skeletons), which inadequately capture intricate dynamics and hinder generalization to arbitrary, non-humanoid characters. To address these challenges, we present DreamActor-M2, a universal animation framework that reimagines motion conditioning as an in-context learning problem. Our approach follows a two-stage paradigm. First, we bridge the input modality gap by fusing reference appearance and motion cues into a unified latent space, enabling the model to…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsGenerative Adversarial Networks and Image Synthesis · Human Motion and Animation · Multimodal Machine Learning Applications