MegActor-$\Sigma$: Unlocking Flexible Mixed-Modal Control in Portrait   Animation with Diffusion Transformer

Shurong Yang; Huadong Li; Juhao Wu; Minhao Jing; Linze Li; Renhe Ji,; Jiajun Liang; Haoqiang Fan; Jin Wang

arXiv:2408.14975·cs.CV·August 28, 2024

MegActor-$\Sigma$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer

Shurong Yang, Huadong Li, Juhao Wu, Minhao Jing, Linze Li, Renhe Ji,, Jiajun Liang, Haoqiang Fan, Jin Wang

PDF

Open Access 2 Repos 1 Video

TL;DR

This paper introduces MegActor-$oldsymbol{\Sigma}$, a diffusion transformer model that enables flexible mixed-modal control in portrait animation by integrating audio and visual signals and balancing their influence.

Contribution

The paper presents a novel mixed-modal control framework with a diffusion transformer, including new training and inference strategies for flexible control of portrait animations.

Findings

01

Outperforms previous methods in portrait animation quality.

02

Effectively balances audio and visual control signals.

03

Enables adjustable motion amplitude for personalized animations.

Abstract

Diffusion models have demonstrated superior performance in the field of portrait animation. However, current approaches relied on either visual or audio modality to control character movements, failing to exploit the potential of mixed-modal control. This challenge arises from the difficulty in balancing the weak control strength of audio modality and the strong control strength of visual modality. To address this issue, we introduce MegActor- $Σ$ : a mixed-modal conditional diffusion transformer (DiT), which can flexibly inject audio and visual modality control signals into portrait animation. Specifically, we make substantial advancements over its predecessor, MegActor, by leveraging the promising model structure of DiT and integrating audio and visual conditions through advanced modules within the DiT framework. To further achieve flexible combinations of mixed-modal control…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

Videos

MegActor-$\Sigma$: Unlocking Flexible Mixed-Modal Control in Portrait Animation with Diffusion Transformer· underline

Taxonomy

TopicsComputer Graphics and Visualization Techniques · Human Motion and Animation · 3D Shape Modeling and Analysis

MethodsDiffusion