Generate Your Talking Avatar from Video Reference

Zujin Guo; Zhenhui Ye; Yi Ren; Yuanming Li; Ce Chen; Zhibin Hong; Chen Change Loy

arXiv:2604.27918·cs.CV·May 1, 2026

Generate Your Talking Avatar from Video Reference

Zujin Guo, Zhenhui Ye, Yi Ren, Yuanming Li, Ce Chen, Zhibin Hong, Chen Change Loy

PDF

1 Repo

TL;DR

TAVR is a novel framework for generating high-fidelity talking avatars from cross-scene video references, overcoming single-view limitations through a multi-stage training scheme and a new robustness benchmark.

Contribution

It introduces a cross-scene video reference approach with a token selection module and a three-stage training process for robust avatar synthesis.

Findings

01

TAVR outperforms existing methods quantitatively and qualitatively.

02

The framework demonstrates strong cross-scene robustness.

03

A new benchmark with 158 cross-scene video pairs was created.

Abstract

Existing talking avatar methods typically adopt an image-to-video pipeline conditioned on a static reference image within the same scene as the target generation. This restricted, single-view perspective lacks sufficient temporal and expression cues, limiting the ability to synthesize high-fidelity talking avatars in customized backgrounds. To this end, we introduce Talking Avatar generation from Video Reference (TAVR), a novel framework that shifts the paradigm by leveraging cross-scene video inputs. To effectively process these extended temporal contexts and bridge cross-scene domain gaps, TAVR integrates a token selection module alongside a comprehensive three-stage training scheme. Specifically, same-scene video pretraining establishes foundational appearance copying, which is subsequently expanded by cross-scene reference fine-tuning for robust cross-scene adaptation. Finally,…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

https://www.heygen.com/research
github

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.