EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Rang Meng; Yan Wang; Weipeng Wu; Ruobing Zheng; Yuming Li; Chenguang Ma

arXiv:2507.03905·cs.CV·March 3, 2026

EchoMimicV3: 1.3B Parameters are All You Need for Unified Multi-Modal and Multi-Task Human Animation

Rang Meng, Yan Wang, Weipeng Wu, Ruobing Zheng, Yuming Li, Chenguang Ma

PDF

Open Access 2 Models 1 Video

TL;DR

EchoMimicV3 is a unified, efficient human animation framework using a 1.3B parameter model that handles multiple tasks and modalities simultaneously, reducing costs and improving performance.

Contribution

The paper introduces a novel unified multi-task and multi-modal human animation framework with innovative training strategies and modules, enabling efficient and versatile animation.

Findings

01

Achieves competitive performance with a minimal 1.3B parameter model.

02

Effectively unifies multi-task and multi-modal human animation.

03

Demonstrates efficiency and versatility in extensive experiments.

Abstract

Recent work on human animation usually incorporates large-scale video models, thereby achieving more vivid performance. However, the practical use of such methods is hindered by the slow inference speed and high computational demands. Moreover, traditional work typically employs separate models for each animation task, increasing costs in multi-task scenarios and worsening the dilemma. To address these limitations, we introduce EchoMimicV3, an efficient framework that unifies multi-task and multi-modal human animation. At the core of EchoMimicV3 lies a threefold design: a Soup-of-Tasks paradigm, a Soup-of-Modals paradigm, and a novel training and inference strategy. The Soup-of-Tasks leverages multi-task mask inputs and a counter-intuitive task allocation strategy to achieve multi-task gains without multi-model pains. Meanwhile, the Soup-of-Modals introduces a Coupled-Decoupled…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Models

Videos

EchoMimicV3: 1.3B Parameters Are All You Need for Unified Multi-Modal and Multi-Task Human Animation· underline

Taxonomy

TopicsAugmented Reality Applications · Anatomy and Medical Technology · 3D Shape Modeling and Analysis