MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control

Renjie Lu; Xulong Zhang; Xiaoyang Qu; Jianzong Wang; Shangfei Wang

arXiv:2601.22501·cs.CV·February 2, 2026

MIRRORTALK: Forging Personalized Avatars Via Disentangled Style and Hierarchical Motion Control

Renjie Lu, Xulong Zhang, Xiaoyang Qu, Jianzong Wang, Shangfei Wang

PDF

Open Access

TL;DR

MirrorTalk is a novel generative framework that uses disentangled style encoding and hierarchical diffusion to synthesize personalized talking faces with accurate lip-sync and expressive dynamics, preserving individual speaker styles.

Contribution

The paper introduces a Semantically-Disentangled Style Encoder and hierarchical modulation in a diffusion model for personalized talking face synthesis, addressing style-content entanglement issues.

Findings

01

Outperforms state-of-the-art in lip-sync accuracy

02

Enhances personalization preservation

03

Demonstrates effective style transfer from reference videos

Abstract

Synthesizing personalized talking faces that uphold and highlight a speaker's unique style while maintaining lip-sync accuracy remains a significant challenge. A primary limitation of existing approaches is the intrinsic confounding of speaker-specific talking style and semantic content within facial motions, which prevents the faithful transfer of a speaker's unique persona to arbitrary speech. In this paper, we propose MirrorTalk, a generative framework based on a conditional diffusion model, combined with a Semantically-Disentangled Style Encoder (SDSE) that can distill pure style representations from a brief reference video. To effectively utilize this representation, we further introduce a hierarchical modulation strategy within the diffusion process. This mechanism guides the synthesis by dynamically balancing the contributions of audio and style features across distinct facial…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Generative Adversarial Networks and Image Synthesis · Speech and Audio Processing