HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for   Arbitrary Talking Face Generation Methods

Yongyuan Li; Xiuyuan Qin; Chao Liang; Mingqiang Wei

arXiv:2309.07495·cs.CV·September 15, 2023

HDTR-Net: A Real-Time High-Definition Teeth Restoration Network for Arbitrary Talking Face Generation Methods

Yongyuan Li, Xiuyuan Qin, Chao Liang, Mingqiang Wei

PDF

Open Access 1 Repo

TL;DR

HDTR-Net is a real-time high-definition teeth restoration network designed for arbitrary talking face generation, improving visual quality and synchronization without sacrificing speed, and outperforming current super-resolution methods.

Contribution

The paper introduces HDTR-Net, a universal teeth restoration network that enhances teeth clarity in talking face videos in real-time, maintaining lip sync and temporal consistency.

Findings

01

Achieves real-time high-definition teeth restoration.

02

Speeds up inference by 300% compared to state-of-the-art methods.

03

Maintains lip synchronization and frame coherence.

Abstract

Talking Face Generation (TFG) aims to reconstruct facial movements to achieve high natural lip movements from audio and facial features that are under potential connections. Existing TFG methods have made significant advancements to produce natural and realistic images. However, most work rarely takes visual quality into consideration. It is challenging to ensure lip synchronization while avoiding visual quality degradation in cross-modal generation methods. To address this issue, we propose a universal High-Definition Teeth Restoration Network, dubbed HDTR-Net, for arbitrary TFG methods. HDTR-Net can enhance teeth regions at an extremely fast speed while maintaining synchronization, and temporal consistency. In particular, we propose a Fine-Grained Feature Fusion (FGFF) module to effectively capture fine texture feature information around teeth and surrounding regions, and use these…

Peer Reviews

No public reviews on file for this paper yet. If you reviewed it on a platform where reviews are public (OpenReview, ICLR, NeurIPS, ICML), you can paste yours below so the community can read it here.

Code & Models

Repositories

yylgoodlucky/hdtr
pytorchOfficial

Videos

No videos yet. Explain this paper in a talk, walkthrough, or lecture? Add one.

Taxonomy

TopicsSpeech and Audio Processing · Face recognition and analysis · Generative Adversarial Networks and Image Synthesis

MethodsSPEED: Separable Pyramidal Pooling EncodEr-Decoder for Real-Time Monocular Depth Estimation on Low-Resource Settings