AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation

Yuxin Lu; Jiayang Sun; Guibo Zhu; Min Cao

arXiv:2605.02948·cs.LG·May 12, 2026

AsymTalker: Identity-Consistent Long-Term Talking Head Generation via Asymmetric Distillation

Yuxin Lu, Jiayang Sun, Guibo Zhu, Min Cao

PDF

TL;DR

AsymTalker is a diffusion-based method that ensures long-term, identity-consistent talking head videos by addressing temporal misalignment and identity drift through novel encoding and distillation techniques.

Contribution

It introduces Temporal Reference Encoding and Asymmetric Knowledge Distillation to improve long-term coherence and identity preservation in talking head generation.

Findings

01

Achieves state-of-the-art results on HDTF and VFHQ datasets.

02

Guarantees high-fidelity, identity-consistent videos over 600 seconds.

03

Operates at a real-time speed of 66 FPS.

Abstract

Diffusion-based talking head generation has achieved remarkable visual quality, yet scaling it to long-term videos remains challenging. The widely adopted chunk-wise paradigm introduces two fundamental failures: (1) temporal-spatial misalignment between static identity references and dynamic audio streams, and (2) cascading identity drift propagated through self-generated continuity references across chunks. To address both issues, we propose AsymTalker, a novel diffusion-based talking head generation method comprising Temporal Reference Encoding (TRE) and Asymmetric Knowledge Distillation (AKD). First, TRE mitigates temporal-spatial misalignment by transforming the static identity image into a temporally coherent latent representation through encoding of a temporally replicated pseudo-video, without introducing additional parameters. Second, AKD resolves the inherent conditioning…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.