Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching

Haiyang Liu; Xiaolin Hong; Xuancheng Yang; Yudi Ruan; Xiang Lian; Michael Lingelbach; Hongwei Yi; Wei Li

arXiv:2507.18649·cs.CV·July 28, 2025

Livatar-1: Real-Time Talking Heads Generation with Tailored Flow Matching

Haiyang Liu, Xiaolin Hong, Xuancheng Yang, Yudi Ruan, Xiang Lian, Michael Lingelbach, Hongwei Yi, Wei Li

PDF

Open Access

TL;DR

Livatar is a real-time system for generating talking head videos driven by audio, achieving high lip-sync accuracy and low latency, making high-fidelity avatars accessible for various applications.

Contribution

The paper introduces a flow matching based framework for real-time talking head generation that improves lip-sync accuracy and system efficiency.

Findings

01

Achieves 8.50 LipSync Confidence on HDTF dataset

02

Reaches 141 FPS throughput with 0.17s latency on A10 GPU

03

Outperforms existing methods in lip-sync quality and speed

Abstract

We present Livatar, a real-time audio-driven talking heads videos generation framework. Existing baselines suffer from limited lip-sync accuracy and long-term pose drift. We address these limitations with a flow matching based framework. Coupled with system optimizations, Livatar achieves competitive lip-sync quality with a 8.50 LipSync Confidence on the HDTF dataset, and reaches a throughput of 141 FPS with an end-to-end latency of 0.17s on a single A10 GPU. This makes high-fidelity avatars accessible to broader applications. Our project is available at https://www.hedra.com/ with with examples at https://h-liu1997.github.io/Livatar-1/

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsFace recognition and analysis · Speech and Audio Processing · Music Technology and Sound Studies