MegActor-$\Sigma$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer
Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji,, Jiajun Liang, Haoqiang Fan, Jin Wang

TL;DR
This paper introduces MegActor-$oldsymbol{\Sigma}$, a diffusion transformer model that enables flexible mixed-modal control in portrait animation by integrating audio and visual signals and balancing their influence.
Contribution
The paper presents a novel mixed-modal control framework with a diffusion transformer, including new training and inference strategies for flexible control of portrait animations.
Findings
Outperforms previous methods in portrait animation quality.
Effectively balances audio and visual control signals.
Enables adjustable motion amplitude for personalized animations.
Abstract
Diffusion models have demonstrated superior performance in the field of portrait animation. However, current approaches relied on either visual or audio modality to control character movements, failing to exploit the potential of mixed-modal control. This challenge arises from the difficulty in balancing the weak control strength of audio modality and the strong control strength of visual modality. To address this issue, we introduce MegActor-: a mixed-modal conditional diffusion transformer (DiT), which can flexibly inject audio and visual modality control signals into portrait animation. Specifically, we make substantial advancements over its predecessor, MegActor, by leveraging the promising model structure of DiT and integrating audio and visual conditions through advanced modules within the DiT framework. To further achieve flexible combinations of mixed-modal control…
Peer Reviews
No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.
Code & Models
Videos
Taxonomy
TopicsComputer Graphics and Visualization Techniques · Human Motion and Animation · 3D Shape Modeling and Analysis
MethodsDiffusion
