MultiAnimate: Pose-Guided Image Animation Made Extensible

Yingcheng Hu; Haowen Gong; Chuanguang Yang; Zhulin An; Yongjun Xu; Songhua Liu

arXiv:2602.21581·cs.CV·May 12, 2026

MultiAnimate: Pose-Guided Image Animation Made Extensible

Yingcheng Hu, Haowen Gong, Chuanguang Yang, Zhulin An, Yongjun Xu, Songhua Liu

PDF

TL;DR

MultiAnimate introduces a diffusion transformer-based framework with novel components for scalable, multi-character pose-guided image animation, achieving state-of-the-art results and generalizing beyond training scenarios.

Contribution

It presents a new extensible multi-character animation framework with Identifier Assigner and Identifier Adapter, enabling generalization and improved performance.

Findings

01

Achieves state-of-the-art multi-character animation results.

02

Generalizes from two-character training to multi-character scenarios.

03

Outperforms existing diffusion-based baselines.

Abstract

Pose-guided human image animation aims to synthesize realistic videos of a reference character driven by a sequence of poses. While diffusion-based methods have achieved remarkable success, most existing approaches are limited to single-character animation. We observe that naively extending these methods to multi-character scenarios often leads to identity confusion and implausible occlusions between characters. To address these challenges, in this paper, we propose an extensible multi-character image animation framework built upon modern Diffusion Transformers (DiTs) for video generation. At its core, our framework introduces two novel components-Identifier Assigner and Identifier Adapter - which collaboratively capture per-person positional cues and inter-person spatial relationships. This mask-driven scheme, along with a scalable training strategy, not only enhances flexibility but…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.