HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis

Shiyu Liu; Kui Jiang; Xianming Liu; Hongxun Yao; Xiaocheng Feng

arXiv:2508.10566·cs.CV·October 31, 2025

HM-Talker: Hybrid Motion Modeling for High-Fidelity Talking Head Synthesis

Shiyu Liu, Kui Jiang, Xianming Liu, Hongxun Yao, Xiaocheng Feng

PDF

TL;DR

HM-Talker introduces a hybrid motion modeling framework combining implicit and explicit cues, including anatomical facial muscle movements, to generate high-fidelity, temporally coherent talking head videos with improved lip-sync and visual quality.

Contribution

The paper presents HM-Talker, a novel framework that explicitly incorporates Action Units for better lip synchronization and cross-subject generalization in talking head synthesis.

Findings

01

Outperforms state-of-the-art in visual quality

02

Achieves superior lip-sync accuracy

03

Enhances cross-subject generalization

Abstract

Audio-driven talking head video generation enhances user engagement in human-computer interaction. However, current methods frequently produce videos with motion blur and lip jitter, primarily due to their reliance on implicit modeling of audio-facial motion correlations--an approach lacking explicit articulatory priors (i.e., anatomical guidance for speech-related facial movements). To overcome this limitation, we propose HM-Talker, a novel framework for generating high-fidelity, temporally coherent talking heads. HM-Talker leverages a hybrid motion representation combining both implicit and explicit motion cues. Explicit cues use Action Units (AUs), anatomically defined facial muscle movements, alongside implicit features to minimize phoneme-viseme misalignment. Specifically, our Cross-Modal Disentanglement Module (CMDM) extracts complementary implicit/explicit motion features while…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.